[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436952#comment-16436952 ] ASF GitHub Bot commented on NIFI-1706: -- Github user asfgit closed the pull request at: https://github.com/apache/nifi/pull/2618 > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks >Priority: Major > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436951#comment-16436951 ] ASF subversion and git services commented on NIFI-1706: --- Commit 82ac815536d53c00f848f2eae79474035a9eb126 in nifi's branch refs/heads/master from [~patricker] [ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=82ac815 ] NIFI-1706: Extend QueryDatabaseTable to support arbitrary queries - Only include Maximum Value columns in the type map. - Squashed commits in the previous PR - Rebased against the latest master - Added stop method to GenerateTableFetch so that it refreshes the column type map when it gets restarted - Fixed whitespacing around if/for statement - Updated expressionLanguageSupported value since it is not auto-merged correctly This closes #2618. Signed-off-by: Koji Kawamura > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks >Priority: Major > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431652#comment-16431652 ] ASF GitHub Bot commented on NIFI-1706: -- Github user patricker commented on the issue: https://github.com/apache/nifi/pull/2618 Building and unit tests for both QueryDatabaseTable and GenerateTableFetch were good. I ran a real test of an MS SQL table using the custom query function and it worked as expected. > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks >Priority: Major > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431574#comment-16431574 ] ASF GitHub Bot commented on NIFI-1706: -- Github user patricker commented on the issue: https://github.com/apache/nifi/pull/2618 Thanks @ijokarumawak for the fixes. I'm building and will test. > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks >Priority: Major > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431570#comment-16431570 ] ASF GitHub Bot commented on NIFI-1706: -- Github user patricker closed the pull request at: https://github.com/apache/nifi/pull/2162 > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks >Priority: Major > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16430406#comment-16430406 ] ASF GitHub Bot commented on NIFI-1706: -- Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/2162#discussion_r180055717 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java --- @@ -291,20 +291,30 @@ public void setup(final ProcessContext context, boolean shouldCleanCache, FlowFi if (shouldCleanCache){ columnTypeMap.clear(); } + +final List maxValueColumnNameList = Arrays.asList(maxValueColumnNames.toLowerCase().split(",")); +final List maxValueQualifiedColumnNameList = new ArrayList<>(); + +for(String maxValueColumn:maxValueColumnNameList){ +String colKey = getStateKey(tableName, maxValueColumn.trim()); +maxValueQualifiedColumnNameList.add(colKey); +} + for (int i = 1; i <= numCols; i++) { String colName = resultSetMetaData.getColumnName(i).toLowerCase(); String colKey = getStateKey(tableName, colName); + +//only include columns that are part of the maximum value tracking column list + if(!maxValueQualifiedColumnNameList.contains(colKey)){ +continue; +} + int colType = resultSetMetaData.getColumnType(i); columnTypeMap.putIfAbsent(colKey, colType); } -List maxValueColumnNameList = Arrays.asList(maxValueColumnNames.split(",")); - -for(String maxValueColumn:maxValueColumnNameList){ -String colKey = getStateKey(tableName, maxValueColumn.trim().toLowerCase()); -if(!columnTypeMap.containsKey(colKey)){ -throw new ProcessException("Column not found in the table/query specified: " + maxValueColumn); -} +if(maxValueQualifiedColumnNameList.size() > 0 && columnTypeMap.size() != maxValueQualifiedColumnNameList.size()){ --- End diff -- @patricker This check should be implemented as the previous commit. The size of columnTypeMap can be different with GenerateTableFetch when it's configured to resolve table and column names dynamically with FlowFile EL and deals with multiple tables. > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks >Priority: Major > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16430405#comment-16430405 ] ASF GitHub Bot commented on NIFI-1706: -- Github user ijokarumawak commented on the issue: https://github.com/apache/nifi/pull/2162 Hi @patricker I have rebased your PR against the latest master and fixed the unit test failures. And created another PR based on yours. Would you take a look on this? #2618 Would you close this one if it looks to you to make a single combined PR? Thanks! @mattyb149 It'd be appreciated if you can take a look on #2618, too. Since it has updates related to what you've done with #2504 NIFI-4773. Thanks! > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks >Priority: Major > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16430400#comment-16430400 ] ASF GitHub Bot commented on NIFI-1706: -- GitHub user ijokarumawak opened a pull request: https://github.com/apache/nifi/pull/2618 NIFI-1706: Extend QueryDatabaseTable to support arbitrary queries Thank you for submitting a contribution to Apache NiFi. In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: ### For all changes: - [x] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [x] Does your PR title start with NIFI- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [x] Has your PR been rebased against the latest commit within the target branch (typically master)? - [ ] Is your initial contribution a single, squashed commit? ### For code changes: - [x] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder? - [ ] Have you written or updated unit tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly? - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly? - [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties? ### For documentation related changes: - [ ] Have you ensured that format looks appropriate for the output in which it is rendered? ### Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ijokarumawak/nifi nifi-1706 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/2618.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2618 commit 8155c9d1679a664d102b45ba0e87e7f60d33f26f Author: patricker Date: 2017-09-19T05:50:06Z NIFI-1706: Extend QueryDatabaseTable to support arbitrary queries NIFI-1706 NIFI-1706 - Only include Maximum Value columns in the type map. This closes #2162. Signed-off-by: Koji Kawamura commit 7664e6c503bae2968b6d19523ddb33a5be832f5e Author: Koji Kawamura Date: 2018-04-09T10:48:14Z NIFI-1706: Extend QueryDatabaseTable to support arbitrary queries - Squashed commits in the previous PR - Rebased against the latest master - Added stop method to GenerateTableFetch so that it refreshes the column type map when it gets restarted - Fixed whitespacing around if/for statement - Updated expressionLanguageSupported value since it is not auto-merged correctly > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks >Priority: Major > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16430286#comment-16430286 ] ASF GitHub Bot commented on NIFI-1706: -- Github user ijokarumawak commented on the issue: https://github.com/apache/nifi/pull/2162 @patricker Thanks for the update, and sorry for my delayed response. I just got the chance to review this today. This has a conflict with #2504 . I can fix compilation level conflict by removing AbstractDatabaseProcessor.onPropertyModified which has been deleted by #2504. However, It seems that we need to update other places, too. I encountered few unit test failures at TestGenerateTableFetch: ``` [ERROR] TestGenerateTableFetch.testAddedRows:202 expected:<4> but was:<0> [ERROR] TestGenerateTableFetch.testAddedRowsRightBounded:402 expected:<4> but was:<0> [ERROR] TestGenerateTableFetch.testAddedRowsTwoTables:276 expected:<1> but was:<0> [ERROR] TestGenerateTableFetch.testAddedRowsWithCustomWhereClause:1112 expected:<5> but was:<0> [ERROR] TestGenerateTableFetch.testMultipleColumnTypeMissing:1250 expected:<1> but was:<0> ``` I'll continue looking at these test failures. It'd be appreciated if you can rebase the PR with the latest master and see how we can fix those, too. Thanks! > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks >Priority: Major > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358036#comment-16358036 ] ASF GitHub Bot commented on NIFI-1706: -- Github user ijokarumawak commented on the issue: https://github.com/apache/nifi/pull/2162 @patricker Thanks for the updates! It mostly looks good to me. Did you notice my latest comment? https://github.com/apache/nifi/pull/2162#discussion_r166504892 I forgot to mention you from the comment, so just wanted to check. How do you think about that? That is the only remaining concern for me. > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks >Priority: Major > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354881#comment-16354881 ] ASF GitHub Bot commented on NIFI-1706: -- Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/2162#discussion_r166504892 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java --- @@ -249,34 +260,56 @@ public void setup(final ProcessContext context, boolean shouldCleanCache, FlowFi return; } -// Try to fill the columnTypeMap with the types of the desired max-value columns -final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class); -final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue(); +// Try to fill the columnTypeMap with the types of the desired max-value columns +final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class); +final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue(); +final String sqlQuery = context.getProperty(SQL_QUERY).evaluateAttributeExpressions().getValue(); final DatabaseAdapter dbAdapter = dbAdapters.get(context.getProperty(DB_TYPE).getValue()); try (final Connection con = dbcpService.getConnection(); final Statement st = con.createStatement()) { -// Try a query that returns no rows, for the purposes of getting metadata about the columns. It is possible -// to use DatabaseMetaData.getColumns(), but not all drivers support this, notably the schema-on-read -// approach as in Apache Drill -String query = dbAdapter.getSelectStatement(tableName, maxValueColumnNames, "1 = 0", null, null, null); -ResultSet resultSet = st.executeQuery(query); -ResultSetMetaData resultSetMetaData = resultSet.getMetaData(); -int numCols = resultSetMetaData.getColumnCount(); -if (numCols > 0) { -if (shouldCleanCache) { -columnTypeMap.clear(); -} -for (int i = 1; i <= numCols; i++) { -String colName = resultSetMetaData.getColumnName(i).toLowerCase(); -String colKey = getStateKey(tableName, colName); -int colType = resultSetMetaData.getColumnType(i); -columnTypeMap.putIfAbsent(colKey, colType); +// Try a query that returns no rows, for the purposes of getting metadata about the columns. It is possible +// to use DatabaseMetaData.getColumns(), but not all drivers support this, notably the schema-on-read +// approach as in Apache Drill +String query; + +if(StringUtils.isEmpty(sqlQuery)) { +query = dbAdapter.getSelectStatement(tableName, maxValueColumnNames, "1 = 0", null, null, null); +} else { +StringBuilder sbQuery = getWrappedQuery(sqlQuery, tableName); --- End diff -- I agree with you to avoid adding any logic to build SQL statement. How about adding one more condition at [AbstractDatabaseFetchProcessor.setup where it populates columnTypeMap](https://github.com/apache/nifi/blob/90d7926907b87a832407573ce20bd7ac5ba56bf9/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java#L295 ), in order to filter out columns those are not specified by 'Max Value Columns'? This way, we don't have to modify SQL statement, but can minimize the number of columns to be stored in state. How do you think? Specifically, following lines of code: ``` // This part adds all columns into columnTypeMap for custom query. We want to capture maxValueColumns only. The maxValueColumnNameList below can be used to do so. for (int i = 1; i <= numCols; i++) { String colName = resultSetMetaData.getColumnName(i).toLowerCase(); String colKey = getStateKey(tableName, colName); int colType = resultSetMetaData.getColumnType(i); columnTypeMap.putIfAbsent(colKey, colType); } List maxValueColumnNameList = Arrays.asList(maxValueColumnNames.split
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354859#comment-16354859 ] ASF GitHub Bot commented on NIFI-1706: -- Github user patricker commented on a diff in the pull request: https://github.com/apache/nifi/pull/2162#discussion_r166500502 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java --- @@ -249,34 +260,56 @@ public void setup(final ProcessContext context, boolean shouldCleanCache, FlowFi return; } -// Try to fill the columnTypeMap with the types of the desired max-value columns -final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class); -final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue(); +// Try to fill the columnTypeMap with the types of the desired max-value columns +final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class); +final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue(); +final String sqlQuery = context.getProperty(SQL_QUERY).evaluateAttributeExpressions().getValue(); final DatabaseAdapter dbAdapter = dbAdapters.get(context.getProperty(DB_TYPE).getValue()); try (final Connection con = dbcpService.getConnection(); final Statement st = con.createStatement()) { -// Try a query that returns no rows, for the purposes of getting metadata about the columns. It is possible -// to use DatabaseMetaData.getColumns(), but not all drivers support this, notably the schema-on-read -// approach as in Apache Drill -String query = dbAdapter.getSelectStatement(tableName, maxValueColumnNames, "1 = 0", null, null, null); -ResultSet resultSet = st.executeQuery(query); -ResultSetMetaData resultSetMetaData = resultSet.getMetaData(); -int numCols = resultSetMetaData.getColumnCount(); -if (numCols > 0) { -if (shouldCleanCache) { -columnTypeMap.clear(); -} -for (int i = 1; i <= numCols; i++) { -String colName = resultSetMetaData.getColumnName(i).toLowerCase(); -String colKey = getStateKey(tableName, colName); -int colType = resultSetMetaData.getColumnType(i); -columnTypeMap.putIfAbsent(colKey, colType); +// Try a query that returns no rows, for the purposes of getting metadata about the columns. It is possible +// to use DatabaseMetaData.getColumns(), but not all drivers support this, notably the schema-on-read +// approach as in Apache Drill +String query; + +if(StringUtils.isEmpty(sqlQuery)) { +query = dbAdapter.getSelectStatement(tableName, maxValueColumnNames, "1 = 0", null, null, null); +} else { +StringBuilder sbQuery = getWrappedQuery(sqlQuery, tableName); --- End diff -- I've checked in fixes for everything except this change. I don't want to put in any more SQL building logic than I already have hard coded into QDB. What if I added a new method to `DatabaseAdapter` for wrapping a `SELECT` statement as a sub query. Input parameters would be similar to the existing method for building a SELECT statement; column list, where clause, order by clause. > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks >Priority: Major > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifyin
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354854#comment-16354854 ] ASF GitHub Bot commented on NIFI-1706: -- Github user patricker commented on a diff in the pull request: https://github.com/apache/nifi/pull/2162#discussion_r166499522 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/QueryDatabaseTable.java --- @@ -150,8 +151,13 @@ public QueryDatabaseTable() { final List pds = new ArrayList<>(); pds.add(DBCP_SERVICE); pds.add(DB_TYPE); -pds.add(TABLE_NAME); +pds.add(new PropertyDescriptor.Builder() +.fromPropertyDescriptor(TABLE_NAME) +.description("The name of the database table to be queried. When a custom query is used, this property is used to alias the query and appears as an attribute on the FlowFile.") +.build()); --- End diff -- Good catch. > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks >Priority: Major > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16353875#comment-16353875 ] ASF GitHub Bot commented on NIFI-1706: -- Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/2162#discussion_r166296034 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/QueryDatabaseTable.java --- @@ -150,8 +151,13 @@ public QueryDatabaseTable() { final List pds = new ArrayList<>(); pds.add(DBCP_SERVICE); pds.add(DB_TYPE); -pds.add(TABLE_NAME); +pds.add(new PropertyDescriptor.Builder() +.fromPropertyDescriptor(TABLE_NAME) +.description("The name of the database table to be queried. When a custom query is used, this property is used to alias the query and appears as an attribute on the FlowFile.") +.build()); --- End diff -- Please update `AbstractDatabaseFetchProcessor.onPropertyModified` so that it clears `setupComplete` flag when this TABLE_NAME property is updated, too. Without that, when a custom query is used, `columnTypeMap` will not be populated again if the processor is reconfigured with different table name alias and restarted, that prevent the maxValueColumn to be captured correctly, and produce duplicated query result over and over. > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks >Priority: Major > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16353874#comment-16353874 ] ASF GitHub Bot commented on NIFI-1706: -- Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/2162#discussion_r166280838 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java --- @@ -249,34 +260,56 @@ public void setup(final ProcessContext context, boolean shouldCleanCache, FlowFi return; } -// Try to fill the columnTypeMap with the types of the desired max-value columns -final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class); -final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue(); +// Try to fill the columnTypeMap with the types of the desired max-value columns +final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class); +final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue(); +final String sqlQuery = context.getProperty(SQL_QUERY).evaluateAttributeExpressions().getValue(); final DatabaseAdapter dbAdapter = dbAdapters.get(context.getProperty(DB_TYPE).getValue()); try (final Connection con = dbcpService.getConnection(); final Statement st = con.createStatement()) { -// Try a query that returns no rows, for the purposes of getting metadata about the columns. It is possible -// to use DatabaseMetaData.getColumns(), but not all drivers support this, notably the schema-on-read -// approach as in Apache Drill -String query = dbAdapter.getSelectStatement(tableName, maxValueColumnNames, "1 = 0", null, null, null); -ResultSet resultSet = st.executeQuery(query); -ResultSetMetaData resultSetMetaData = resultSet.getMetaData(); -int numCols = resultSetMetaData.getColumnCount(); -if (numCols > 0) { -if (shouldCleanCache) { -columnTypeMap.clear(); -} -for (int i = 1; i <= numCols; i++) { -String colName = resultSetMetaData.getColumnName(i).toLowerCase(); -String colKey = getStateKey(tableName, colName); -int colType = resultSetMetaData.getColumnType(i); -columnTypeMap.putIfAbsent(colKey, colType); +// Try a query that returns no rows, for the purposes of getting metadata about the columns. It is possible +// to use DatabaseMetaData.getColumns(), but not all drivers support this, notably the schema-on-read +// approach as in Apache Drill +String query; + +if(StringUtils.isEmpty(sqlQuery)) { +query = dbAdapter.getSelectStatement(tableName, maxValueColumnNames, "1 = 0", null, null, null); +} else { +StringBuilder sbQuery = getWrappedQuery(sqlQuery, tableName); +sbQuery.append(" WHERE 1=0"); + +query = sbQuery.toString(); +} + +ResultSet resultSet = st.executeQuery(query); +ResultSetMetaData resultSetMetaData = resultSet.getMetaData(); +int numCols = resultSetMetaData.getColumnCount(); +if (numCols > 0) { +if (shouldCleanCache){ +columnTypeMap.clear(); +} +for (int i = 1; i <= numCols; i++) { +String colName = resultSetMetaData.getColumnName(i).toLowerCase(); +String colKey = getStateKey(tableName, colName); +int colType = resultSetMetaData.getColumnType(i); +columnTypeMap.putIfAbsent(colKey, colType); +} + +List maxValueColumnNameList = org.apache.commons.lang3.StringUtils.isEmpty(maxValueColumnNames) +? null +: Arrays.asList(maxValueColumnNames.split("\\s*,\\s*")); --- End diff -- I guess the aim of the regex `\s*,\s` is to split and trim whitespaces. But it leaves whitespaces at the head and tail. I'd suggest simply split with `,` and trim it in the for loop below before `toLowerCase()`. > Extend QueryDa
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16353878#comment-16353878 ] ASF GitHub Bot commented on NIFI-1706: -- Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/2162#discussion_r166294484 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java --- @@ -249,34 +260,56 @@ public void setup(final ProcessContext context, boolean shouldCleanCache, FlowFi return; } -// Try to fill the columnTypeMap with the types of the desired max-value columns -final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class); -final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue(); +// Try to fill the columnTypeMap with the types of the desired max-value columns +final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class); +final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue(); +final String sqlQuery = context.getProperty(SQL_QUERY).evaluateAttributeExpressions().getValue(); final DatabaseAdapter dbAdapter = dbAdapters.get(context.getProperty(DB_TYPE).getValue()); try (final Connection con = dbcpService.getConnection(); final Statement st = con.createStatement()) { -// Try a query that returns no rows, for the purposes of getting metadata about the columns. It is possible -// to use DatabaseMetaData.getColumns(), but not all drivers support this, notably the schema-on-read -// approach as in Apache Drill -String query = dbAdapter.getSelectStatement(tableName, maxValueColumnNames, "1 = 0", null, null, null); -ResultSet resultSet = st.executeQuery(query); -ResultSetMetaData resultSetMetaData = resultSet.getMetaData(); -int numCols = resultSetMetaData.getColumnCount(); -if (numCols > 0) { -if (shouldCleanCache) { -columnTypeMap.clear(); -} -for (int i = 1; i <= numCols; i++) { -String colName = resultSetMetaData.getColumnName(i).toLowerCase(); -String colKey = getStateKey(tableName, colName); -int colType = resultSetMetaData.getColumnType(i); -columnTypeMap.putIfAbsent(colKey, colType); +// Try a query that returns no rows, for the purposes of getting metadata about the columns. It is possible +// to use DatabaseMetaData.getColumns(), but not all drivers support this, notably the schema-on-read +// approach as in Apache Drill +String query; + +if(StringUtils.isEmpty(sqlQuery)) { +query = dbAdapter.getSelectStatement(tableName, maxValueColumnNames, "1 = 0", null, null, null); +} else { +StringBuilder sbQuery = getWrappedQuery(sqlQuery, tableName); --- End diff -- Since `getWrappedQuery` fetches all available columns in the sub-query by '*', subsequent loop stores all column values into the managed state. We should avoid storing unnecessary values into the state by either specifying the maxValueColumns to this query, or do some filtering in the loop below. > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks >Priority: Major > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16353873#comment-16353873 ] ASF GitHub Bot commented on NIFI-1706: -- Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/2162#discussion_r166277383 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java --- @@ -249,34 +260,56 @@ public void setup(final ProcessContext context, boolean shouldCleanCache, FlowFi return; } -// Try to fill the columnTypeMap with the types of the desired max-value columns -final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class); -final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue(); +// Try to fill the columnTypeMap with the types of the desired max-value columns +final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class); +final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue(); +final String sqlQuery = context.getProperty(SQL_QUERY).evaluateAttributeExpressions().getValue(); --- End diff -- Wrong indent. > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks >Priority: Major > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16353877#comment-16353877 ] ASF GitHub Bot commented on NIFI-1706: -- Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/2162#discussion_r166277436 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java --- @@ -249,34 +260,56 @@ public void setup(final ProcessContext context, boolean shouldCleanCache, FlowFi return; } -// Try to fill the columnTypeMap with the types of the desired max-value columns -final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class); -final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue(); +// Try to fill the columnTypeMap with the types of the desired max-value columns +final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class); +final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue(); +final String sqlQuery = context.getProperty(SQL_QUERY).evaluateAttributeExpressions().getValue(); final DatabaseAdapter dbAdapter = dbAdapters.get(context.getProperty(DB_TYPE).getValue()); try (final Connection con = dbcpService.getConnection(); final Statement st = con.createStatement()) { -// Try a query that returns no rows, for the purposes of getting metadata about the columns. It is possible -// to use DatabaseMetaData.getColumns(), but not all drivers support this, notably the schema-on-read -// approach as in Apache Drill -String query = dbAdapter.getSelectStatement(tableName, maxValueColumnNames, "1 = 0", null, null, null); -ResultSet resultSet = st.executeQuery(query); -ResultSetMetaData resultSetMetaData = resultSet.getMetaData(); -int numCols = resultSetMetaData.getColumnCount(); -if (numCols > 0) { -if (shouldCleanCache) { -columnTypeMap.clear(); -} -for (int i = 1; i <= numCols; i++) { -String colName = resultSetMetaData.getColumnName(i).toLowerCase(); -String colKey = getStateKey(tableName, colName); -int colType = resultSetMetaData.getColumnType(i); -columnTypeMap.putIfAbsent(colKey, colType); +// Try a query that returns no rows, for the purposes of getting metadata about the columns. It is possible --- End diff -- Wrong indent. > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks >Priority: Major > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16353876#comment-16353876 ] ASF GitHub Bot commented on NIFI-1706: -- Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/2162#discussion_r166278036 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java --- @@ -249,34 +260,56 @@ public void setup(final ProcessContext context, boolean shouldCleanCache, FlowFi return; } -// Try to fill the columnTypeMap with the types of the desired max-value columns -final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class); -final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue(); +// Try to fill the columnTypeMap with the types of the desired max-value columns +final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class); +final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue(); +final String sqlQuery = context.getProperty(SQL_QUERY).evaluateAttributeExpressions().getValue(); final DatabaseAdapter dbAdapter = dbAdapters.get(context.getProperty(DB_TYPE).getValue()); try (final Connection con = dbcpService.getConnection(); final Statement st = con.createStatement()) { -// Try a query that returns no rows, for the purposes of getting metadata about the columns. It is possible -// to use DatabaseMetaData.getColumns(), but not all drivers support this, notably the schema-on-read -// approach as in Apache Drill -String query = dbAdapter.getSelectStatement(tableName, maxValueColumnNames, "1 = 0", null, null, null); -ResultSet resultSet = st.executeQuery(query); -ResultSetMetaData resultSetMetaData = resultSet.getMetaData(); -int numCols = resultSetMetaData.getColumnCount(); -if (numCols > 0) { -if (shouldCleanCache) { -columnTypeMap.clear(); -} -for (int i = 1; i <= numCols; i++) { -String colName = resultSetMetaData.getColumnName(i).toLowerCase(); -String colKey = getStateKey(tableName, colName); -int colType = resultSetMetaData.getColumnType(i); -columnTypeMap.putIfAbsent(colKey, colType); +// Try a query that returns no rows, for the purposes of getting metadata about the columns. It is possible +// to use DatabaseMetaData.getColumns(), but not all drivers support this, notably the schema-on-read +// approach as in Apache Drill +String query; + +if(StringUtils.isEmpty(sqlQuery)) { +query = dbAdapter.getSelectStatement(tableName, maxValueColumnNames, "1 = 0", null, null, null); +} else { +StringBuilder sbQuery = getWrappedQuery(sqlQuery, tableName); +sbQuery.append(" WHERE 1=0"); + +query = sbQuery.toString(); +} + +ResultSet resultSet = st.executeQuery(query); +ResultSetMetaData resultSetMetaData = resultSet.getMetaData(); +int numCols = resultSetMetaData.getColumnCount(); +if (numCols > 0) { +if (shouldCleanCache){ +columnTypeMap.clear(); +} +for (int i = 1; i <= numCols; i++) { +String colName = resultSetMetaData.getColumnName(i).toLowerCase(); +String colKey = getStateKey(tableName, colName); +int colType = resultSetMetaData.getColumnType(i); +columnTypeMap.putIfAbsent(colKey, colType); +} + +List maxValueColumnNameList = org.apache.commons.lang3.StringUtils.isEmpty(maxValueColumnNames) --- End diff -- I think we can use `org.apache.nifi.util.StringUtils` here instead, which is already imported. Moreover, we can remove this emptiness check because it's already checked at the beginning of this method. > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352075#comment-16352075 ] ASF GitHub Bot commented on NIFI-1706: -- Github user patricker commented on the issue: https://github.com/apache/nifi/pull/2162 @ijokarumawak Updated. > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks >Priority: Major > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336907#comment-16336907 ] ASF GitHub Bot commented on NIFI-1706: -- Github user patricker commented on a diff in the pull request: https://github.com/apache/nifi/pull/2162#discussion_r163454074 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java --- @@ -156,10 +157,22 @@ .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) .build(); +public static final PropertyDescriptor SQL_QUERY = new PropertyDescriptor.Builder() +.name("db-fetch-sql-query") +.displayName("Custom Query") +.description("A custom SQL query used to retrieve data. Instead of building a SQL query from " ++ "other properties, this query will be used. Query must have no WHERE or ORDER BY statements. " ++ "If a WHERE clause is needed use a sub-query or the 'Additional WHERE clause' property.") --- End diff -- Sounds like a good idea. Users still won't be able to include an ORDER BY, as those are not allowed in sub queries in every database I've worked with. > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks >Priority: Major > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336812#comment-16336812 ] ASF GitHub Bot commented on NIFI-1706: -- Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/2162#discussion_r163441205 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/QueryDatabaseTable.java --- @@ -366,10 +401,22 @@ public void onTrigger(final ProcessContext context, final ProcessSessionFactory protected String getQuery(DatabaseAdapter dbAdapter, String tableName, String columnNames, List maxValColumnNames, String customWhereClause, Map stateMap) { + +return getQuery(dbAdapter, tableName, "", columnNames, maxValColumnNames, customWhereClause, stateMap); --- End diff -- `null` may be preferable over the empty string object for `sqlQuery`. > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks >Priority: Major > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336813#comment-16336813 ] ASF GitHub Bot commented on NIFI-1706: -- Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/2162#discussion_r163442381 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java --- @@ -156,10 +157,22 @@ .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) .build(); +public static final PropertyDescriptor SQL_QUERY = new PropertyDescriptor.Builder() +.name("db-fetch-sql-query") +.displayName("Custom Query") +.description("A custom SQL query used to retrieve data. Instead of building a SQL query from " ++ "other properties, this query will be used. Query must have no WHERE or ORDER BY statements. " ++ "If a WHERE clause is needed use a sub-query or the 'Additional WHERE clause' property.") --- End diff -- Can we generate a sub-query from this property by simply wrapping it with braces and add the specified `Table name` as the alias name of it? If we do so, we can use Expression Language and incoming FlowFiles more creatively with the single QueryDatabaseTable processor instance. Also it will allow user to define WHERE and ORDER BY statement here. With that approach, we can keep using the `Table Name` property as it is, and the custom validation is not required. This `Custom Query` property can be truly an optional property. Current proposed approach use a constant "NiFi_QDBT" table name for state keys, and it can limit the use-case. How do you think? > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks >Priority: Major > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336788#comment-16336788 ] Koji Kawamura commented on NIFI-1706: - Thanks [~patricker]. I thought about that situation where NiFi admin is not allowed to create VIEWs, too. By looking at the proposed change, and other alternatives you listed in the previous comment. I agree with you, extending QueryDatabaseTable looks the way to go. I will try to find a time to review the change. > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks >Priority: Major > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336746#comment-16336746 ] Peter Wicks commented on NIFI-1706: --- [~ijokarumawak] I've written many views for use with QueryDatabaseTable, and it's my preferred approach. The use case for allowing custom SQL is more about permissions. In some cases I have read-only permission to the server/database, and the group that owns the system is not willing to allow me to create views; generally it's a combination of they don't have time to support me + every other team that writes reports (MS SQL Reporting Services, Tableau, etc...) is successful with their currently structure, so why can't I be? There are alternatives to adding this functionality to QueryDatabaseTable: * Create a brand new processor that is a cross between QueryDatabaseTable and ExecuteSQL * Extend ExecuteSQL to include optional state tracking features. It could be a more manual process where users have to provide initial max values and then insert the max values into the query by hand using EL. > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks >Priority: Major > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334006#comment-16334006 ] Koji Kawamura commented on NIFI-1706: - [~pee...@gmail.com] You can define select queries in the target RDBMS as VIEW. This is an example of MySQL, you can use JOIN in VIEW to provide rich data. https://stackoverflow.com/questions/12352048/mysql-create-view-joining-two-tables > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks >Priority: Major > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333992#comment-16333992 ] Paul Bormans commented on NIFI-1706: Hello Koji, I'm not familiar with the Views concept (have been "out" for some some since i wrote the ticket), but i still feel that the processor is not very useful if the aim is to extract "rich" data from an rdbms without support for a more complex query. The only alternative that you then have is to extract on per-table basis and join somewhere else... Paul > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks >Priority: Major > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333933#comment-16333933 ] Koji Kawamura commented on NIFI-1706: - Hi [~pee...@gmail.com] [~patricker], I found a similar JIRA NIFI-2200 before I found this one. I closed NIFI-2200 as I believe VIEWs can replace sub-query needs. VIEWs can cover the improvements requested by this JIRA, too. I personally prefer to keep a processor implementation as simple as possible for sustainable maintainability. I appreciate Peter's work but wanted to discuss. How do you think? Thank you. > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks >Priority: Major > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211993#comment-16211993 ] ASF GitHub Bot commented on NIFI-1706: -- Github user mattyb149 commented on the issue: https://github.com/apache/nifi/pull/2162 Yes, sorry for the delay, got swamped > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211979#comment-16211979 ] ASF GitHub Bot commented on NIFI-1706: -- Github user patricker commented on the issue: https://github.com/apache/nifi/pull/2162 @mattyb149 Can you take another look? I think I've addressed all of your concerns. > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198363#comment-16198363 ] ASF GitHub Bot commented on NIFI-1706: -- Github user patricker commented on a diff in the pull request: https://github.com/apache/nifi/pull/2162#discussion_r143664137 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java --- @@ -240,7 +254,14 @@ public void setup(final ProcessContext context) { // Try a query that returns no rows, for the purposes of getting metadata about the columns. It is possible // to use DatabaseMetaData.getColumns(), but not all drivers support this, notably the schema-on-read // approach as in Apache Drill -String query = dbAdapter.getSelectStatement(tableName, maxValueColumnNames, "1 = 0", null, null, null); +String query; + +if(StringUtils.isEmpty(sqlQuery)) { +query = dbAdapter.getSelectStatement(tableName, maxValueColumnNames, "1 = 0", null, null, null); +} else { +query=sqlQuery + " WHERE 1=0"; --- End diff -- @mattyb149 > If they specify a max-value column in the other property, and it is not available in this query, then the getSelectStatement() below doesn't seem like it would work as expected What about if I take the list of maxValueColumns and use those to build the equivelant of the "1=0" condition. For example, if our max value column names are `x` and `y` I could build a where expression `x <> x AND y <> y`. This would ensure the column names were present, and if they aren't an exception would be thrown. > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198357#comment-16198357 ] ASF GitHub Bot commented on NIFI-1706: -- Github user patricker commented on a diff in the pull request: https://github.com/apache/nifi/pull/2162#discussion_r143663025 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/QueryDatabaseTable.java --- @@ -190,8 +222,11 @@ public void onTrigger(final ProcessContext context, final ProcessSessionFactory final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class); final DatabaseAdapter dbAdapter = dbAdapters.get(context.getProperty(DB_TYPE).getValue()); -final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions().getValue(); + +final String propTableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions().getValue(); +final String tableName = StringUtils.isEmpty(propTableName) ? ARBITRARY_SQL_TABLE_NAME : propTableName; final String columnNames = context.getProperty(COLUMN_NAMES).evaluateAttributeExpressions().getValue(); +final String sqlQuery = context.getProperty(SQL_QUERY).getValue(); --- End diff -- Makes sense to me. > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198347#comment-16198347 ] ASF GitHub Bot commented on NIFI-1706: -- Github user patricker commented on a diff in the pull request: https://github.com/apache/nifi/pull/2162#discussion_r143661252 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java --- @@ -231,7 +243,9 @@ public void setup(final ProcessContext context) { // Try to fill the columnTypeMap with the types of the desired max-value columns final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class); -final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions().getValue(); +final String propTableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions().getValue(); +final String tableName = org.apache.commons.lang3.StringUtils.isEmpty(propTableName) ? ARBITRARY_SQL_TABLE_NAME : propTableName; --- End diff -- Just some copy/paste action from another chunk of code. > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195219#comment-16195219 ] ASF GitHub Bot commented on NIFI-1706: -- Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2162#discussion_r143279372 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/QueryDatabaseTable.java --- @@ -175,6 +181,32 @@ public QueryDatabaseTable() { return propDescriptors; } +@Override +protected Collection customValidate(ValidationContext validationContext) { +final List results = new ArrayList<>(super.customValidate(validationContext)); + +final String tableName = validationContext.getProperty(TABLE_NAME).getValue(); +final String sqlQuery = validationContext.getProperty(SQL_QUERY).getValue(); + +if(!StringUtils.isEmpty(sqlQuery) && !StringUtils.isEmpty(tableName)){ +results.add(new ValidationResult.Builder() +.valid(false) +.subject("SQL Query") +.explanation("SQL Query and Table Name can't both be specified at the same time.") --- End diff -- This might need some elaboration, in case the user wouldn't necessarily understand why they can't both be specified (due to EL support in the Table Name property, e.g.) > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195220#comment-16195220 ] ASF GitHub Bot commented on NIFI-1706: -- Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2162#discussion_r143280289 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/QueryDatabaseTable.java --- @@ -175,6 +181,32 @@ public QueryDatabaseTable() { return propDescriptors; } +@Override +protected Collection customValidate(ValidationContext validationContext) { +final List results = new ArrayList<>(super.customValidate(validationContext)); + +final String tableName = validationContext.getProperty(TABLE_NAME).getValue(); +final String sqlQuery = validationContext.getProperty(SQL_QUERY).getValue(); + +if(!StringUtils.isEmpty(sqlQuery) && !StringUtils.isEmpty(tableName)){ +results.add(new ValidationResult.Builder() +.valid(false) +.subject("SQL Query") --- End diff -- This should match either the "Arbitrary/Custom Query" or "Table Name" property or both I think > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195218#comment-16195218 ] ASF GitHub Bot commented on NIFI-1706: -- Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2162#discussion_r143290546 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/QueryDatabaseTable.java --- @@ -175,6 +181,32 @@ public QueryDatabaseTable() { return propDescriptors; } +@Override +protected Collection customValidate(ValidationContext validationContext) { +final List results = new ArrayList<>(super.customValidate(validationContext)); + +final String tableName = validationContext.getProperty(TABLE_NAME).getValue(); +final String sqlQuery = validationContext.getProperty(SQL_QUERY).getValue(); + +if(!StringUtils.isEmpty(sqlQuery) && !StringUtils.isEmpty(tableName)){ +results.add(new ValidationResult.Builder() +.valid(false) +.subject("SQL Query") +.explanation("SQL Query and Table Name can't both be specified at the same time.") +.build()); +} + +if(!StringUtils.isEmpty(sqlQuery) && isDynamicMaxValues){ --- End diff -- Is this a hard requirement? Expression Language support (since incoming connections are not allowed for QueryDatabaseTable) is meant to support Variable Registry and environment variables, which could be used in a custom query if it also included such an EL statement? > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195216#comment-16195216 ] ASF GitHub Bot commented on NIFI-1706: -- Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2162#discussion_r143289913 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/QueryDatabaseTable.java --- @@ -77,7 +80,8 @@ @InputRequirement(Requirement.INPUT_FORBIDDEN) @Tags({"sql", "select", "jdbc", "query", "database"}) @SeeAlso({GenerateTableFetch.class, ExecuteSQL.class}) -@CapabilityDescription("Generates and executes a SQL select query to fetch all rows whose values in the specified Maximum Value column(s) are larger than the " +@CapabilityDescription("Generates a SQL select query, or uses a provided statement, and executes it to fetch all rows whose values in the specified " --- End diff -- Nitpick, but an extra space between "or uses" > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195215#comment-16195215 ] ASF GitHub Bot commented on NIFI-1706: -- Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2162#discussion_r143289820 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java --- @@ -231,7 +243,9 @@ public void setup(final ProcessContext context) { // Try to fill the columnTypeMap with the types of the desired max-value columns final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class); -final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions().getValue(); +final String propTableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions().getValue(); +final String tableName = org.apache.commons.lang3.StringUtils.isEmpty(propTableName) ? ARBITRARY_SQL_TABLE_NAME : propTableName; --- End diff -- Why the fully-qualified StringUtils class? If we have both (NiFi and Commons Lang), can we get rid of one? > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195217#comment-16195217 ] ASF GitHub Bot commented on NIFI-1706: -- Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2162#discussion_r143290669 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/QueryDatabaseTable.java --- @@ -190,8 +222,11 @@ public void onTrigger(final ProcessContext context, final ProcessSessionFactory final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class); final DatabaseAdapter dbAdapter = dbAdapters.get(context.getProperty(DB_TYPE).getValue()); -final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions().getValue(); + +final String propTableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions().getValue(); +final String tableName = StringUtils.isEmpty(propTableName) ? ARBITRARY_SQL_TABLE_NAME : propTableName; final String columnNames = context.getProperty(COLUMN_NAMES).evaluateAttributeExpressions().getValue(); +final String sqlQuery = context.getProperty(SQL_QUERY).getValue(); --- End diff -- Should this support Expression Language for Variable Registry support (see my comment above)? > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195214#comment-16195214 ] ASF GitHub Bot commented on NIFI-1706: -- Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2162#discussion_r143279020 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java --- @@ -155,10 +155,22 @@ .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) .build(); +public static final PropertyDescriptor SQL_QUERY = new PropertyDescriptor.Builder() +.name("db-fetch-sql-query") +.displayName("Arbitrary Query") --- End diff -- Possibly call this Custom Query? Arbitrary is... well arbitrary :P and might cause some confusion for the user. Plus the description refers to it as a custom query > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195213#comment-16195213 ] ASF GitHub Bot commented on NIFI-1706: -- Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2162#discussion_r143278853 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java --- @@ -155,10 +155,22 @@ .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) .build(); +public static final PropertyDescriptor SQL_QUERY = new PropertyDescriptor.Builder() +.name("db-fetch-sql-query") +.displayName("Arbitrary Query") +.description("A custom SQL query used to retrieve data. Instead of building a SQL query from " ++ "other properties, this query will be used. Query must have no WHERE or ORDER BY statements. " ++ "If a WHERE clause is needed use a sub-query or the 'Additional WHERE clause' property.") --- End diff -- Do we need more doc here around max-value columns? If they specify a max-value column in the other property, and it is not available in this query, then the getSelectStatement() below doesn't seem like it would work as expected > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171151#comment-16171151 ] ASF GitHub Bot commented on NIFI-1706: -- GitHub user patricker opened a pull request: https://github.com/apache/nifi/pull/2162 NIFI-1706 Extend QueryDatabaseTable to support arbitrary queries ### For all changes: - [x] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [x] Does your PR title start with NIFI- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [x] Has your PR been rebased against the latest commit within the target branch (typically master)? - [x] Is your initial contribution a single, squashed commit? ### For code changes: - [ ] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder? - [x] Have you written or updated unit tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [x] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties? ### For documentation related changes: - [ ] Have you ensured that format looks appropriate for the output in which it is rendered? ### Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. You can merge this pull request into a Git repository by running: $ git pull https://github.com/patricker/nifi NIFI-1706 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/2162.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2162 commit 5169001108051c395be8cee1a08c7dc1fbf1b0f5 Author: patricker Date: 2017-09-19T05:50:06Z NIFI-1706 > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries
[ https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16169771#comment-16169771 ] Peter Wicks commented on NIFI-1706: --- I have this written up with test cases, just working through some more hands on use case testing. > Extend QueryDatabaseTable to support arbitrary queries > -- > > Key: NIFI-1706 > URL: https://issues.apache.org/jira/browse/NIFI-1706 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Affects Versions: 1.4.0 >Reporter: Paul Bormans >Assignee: Peter Wicks > Labels: features > > The QueryDatabaseTable is able to observe a configured database table for new > rows and yield these into the flowfile. The model of an rdbms however is > often (if not always) normalized so you would need to join various tables in > order to "flatten" the data into useful events for a processing pipeline as > can be build with nifi or various tools within the hadoop ecosystem. > The request is to extend the processor to specify an arbitrary sql query > instead of specifying the table name + columns. > In addition (this may be another issue?) it is desired to limit the number of > rows returned per run. Not just because of bandwidth issue's from the nifi > pipeline onwards but mainly because huge databases may not be able to return > so many records within a reasonable time. -- This message was sent by Atlassian JIRA (v6.4.14#64029)