subject:"\[jira\] \[Commented\] \(NIFI\-1706\) Extend QueryDatabaseTable to support arbitrary queries"

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-04-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436952#comment-16436952
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/2618


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>Priority: Major
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-04-13 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436951#comment-16436951
 ] 

ASF subversion and git services commented on NIFI-1706:
---

Commit 82ac815536d53c00f848f2eae79474035a9eb126 in nifi's branch 
refs/heads/master from [~patricker]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=82ac815 ]

NIFI-1706: Extend QueryDatabaseTable to support arbitrary queries

- Only include Maximum Value columns in the type map.
- Squashed commits in the previous PR
- Rebased against the latest master
- Added stop method to GenerateTableFetch so that it refreshes the
column type map when it gets restarted
- Fixed whitespacing around if/for statement
- Updated expressionLanguageSupported value since it is not auto-merged
correctly

This closes #2618.

Signed-off-by: Koji Kawamura 


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>Priority: Major
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-04-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431652#comment-16431652
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user patricker commented on the issue:

https://github.com/apache/nifi/pull/2618
  
Building and unit tests for both QueryDatabaseTable and GenerateTableFetch 
were good.
I ran a real test of an MS SQL table using the custom query function and it 
worked as expected.


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>Priority: Major
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-04-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431574#comment-16431574
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user patricker commented on the issue:

https://github.com/apache/nifi/pull/2618
  
Thanks @ijokarumawak for the fixes. I'm building and will test.


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>Priority: Major
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-04-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431570#comment-16431570
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user patricker closed the pull request at:

https://github.com/apache/nifi/pull/2162


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>Priority: Major
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-04-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16430406#comment-16430406
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2162#discussion_r180055717
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java
 ---
@@ -291,20 +291,30 @@ public void setup(final ProcessContext context, 
boolean shouldCleanCache, FlowFi
 if (shouldCleanCache){
 columnTypeMap.clear();
 }
+
+final List maxValueColumnNameList = 
Arrays.asList(maxValueColumnNames.toLowerCase().split(","));
+final List maxValueQualifiedColumnNameList = 
new ArrayList<>();
+
+for(String maxValueColumn:maxValueColumnNameList){
+String colKey = getStateKey(tableName, 
maxValueColumn.trim());
+maxValueQualifiedColumnNameList.add(colKey);
+}
+
 for (int i = 1; i <= numCols; i++) {
 String colName = 
resultSetMetaData.getColumnName(i).toLowerCase();
 String colKey = getStateKey(tableName, colName);
+
+//only include columns that are part of the 
maximum value tracking column list
+
if(!maxValueQualifiedColumnNameList.contains(colKey)){
+continue;
+}
+
 int colType = resultSetMetaData.getColumnType(i);
 columnTypeMap.putIfAbsent(colKey, colType);
 }
 
-List maxValueColumnNameList = 
Arrays.asList(maxValueColumnNames.split(","));
-
-for(String maxValueColumn:maxValueColumnNameList){
-String colKey = getStateKey(tableName, 
maxValueColumn.trim().toLowerCase());
-if(!columnTypeMap.containsKey(colKey)){
-throw new ProcessException("Column not found 
in the table/query specified: " + maxValueColumn);
-}
+if(maxValueQualifiedColumnNameList.size() > 0 && 
columnTypeMap.size() != maxValueQualifiedColumnNameList.size()){
--- End diff --

@patricker This check should be implemented as the previous commit. The 
size of columnTypeMap can be different with GenerateTableFetch when it's 
configured to resolve table and column names dynamically with FlowFile EL and 
deals with multiple tables.


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>Priority: Major
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-04-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16430405#comment-16430405
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user ijokarumawak commented on the issue:

https://github.com/apache/nifi/pull/2162
  
Hi @patricker 

I have rebased your PR against the latest master and fixed the unit test 
failures. And created another PR based on yours. Would you take a look on this? 
#2618 Would you close this one if it looks to you to make a single combined PR? 
Thanks!

@mattyb149 It'd be appreciated if you can take a look on #2618, too. Since 
it has updates related to what you've done with #2504 NIFI-4773. Thanks!


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>Priority: Major
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-04-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16430400#comment-16430400
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

GitHub user ijokarumawak opened a pull request:

https://github.com/apache/nifi/pull/2618

NIFI-1706: Extend QueryDatabaseTable to support arbitrary queries

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [x] Does your PR title start with NIFI- where  is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.

- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [x] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file under nifi-assembly?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-assembly?
- [ ] If adding new Properties, have you added .displayName in addition to 
.name (programmatic access) for each of the new properties?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijokarumawak/nifi nifi-1706

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/2618.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2618


commit 8155c9d1679a664d102b45ba0e87e7f60d33f26f
Author: patricker 
Date:   2017-09-19T05:50:06Z

NIFI-1706: Extend QueryDatabaseTable to support arbitrary queries

NIFI-1706

NIFI-1706 - Only include Maximum Value columns in the type map.

This closes #2162.

Signed-off-by: Koji Kawamura 

commit 7664e6c503bae2968b6d19523ddb33a5be832f5e
Author: Koji Kawamura 
Date:   2018-04-09T10:48:14Z

NIFI-1706: Extend QueryDatabaseTable to support arbitrary queries

- Squashed commits in the previous PR
- Rebased against the latest master
- Added stop method to GenerateTableFetch so that it refreshes the
column type map when it gets restarted
- Fixed whitespacing around if/for statement
- Updated expressionLanguageSupported value since it is not auto-merged
correctly




> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>Priority: Major
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-04-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16430286#comment-16430286
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user ijokarumawak commented on the issue:

https://github.com/apache/nifi/pull/2162
  
@patricker Thanks for the update, and sorry for my delayed response. I just 
got the chance to review this today.

This has a conflict with #2504 . I can fix compilation level conflict by 
removing AbstractDatabaseProcessor.onPropertyModified which has been deleted by 
#2504.
However, It seems that we need to update other places, too. I encountered 
few unit test failures at 
TestGenerateTableFetch:
```
[ERROR]   TestGenerateTableFetch.testAddedRows:202 expected:<4> but was:<0>
[ERROR]   TestGenerateTableFetch.testAddedRowsRightBounded:402 expected:<4> 
but was:<0>
[ERROR]   TestGenerateTableFetch.testAddedRowsTwoTables:276 expected:<1> 
but was:<0>
[ERROR]   TestGenerateTableFetch.testAddedRowsWithCustomWhereClause:1112 
expected:<5> but was:<0>
[ERROR]   TestGenerateTableFetch.testMultipleColumnTypeMissing:1250 
expected:<1> but was:<0>
```
I'll continue looking at these test failures. It'd be appreciated if you 
can rebase the PR with the latest master and see how we can fix those, too. 
Thanks!


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>Priority: Major
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-02-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358036#comment-16358036
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user ijokarumawak commented on the issue:

https://github.com/apache/nifi/pull/2162
  
@patricker Thanks for the updates! It mostly looks good to me. Did you 
notice my latest comment?
https://github.com/apache/nifi/pull/2162#discussion_r166504892
I forgot to mention you from the comment, so just wanted to check. How do 
you think about that? That is the only remaining concern for me.


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>Priority: Major
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-02-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354881#comment-16354881
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2162#discussion_r166504892
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java
 ---
@@ -249,34 +260,56 @@ public void setup(final ProcessContext context, 
boolean shouldCleanCache, FlowFi
 return;
 }
 
-// Try to fill the columnTypeMap with the types of the desired 
max-value columns
-final DBCPService dbcpService = 
context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
-final String tableName = 
context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
+// Try to fill the columnTypeMap with the types of the desired 
max-value columns
+final DBCPService dbcpService = 
context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
+final String tableName = 
context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
+final String sqlQuery = 
context.getProperty(SQL_QUERY).evaluateAttributeExpressions().getValue();
 
 final DatabaseAdapter dbAdapter = 
dbAdapters.get(context.getProperty(DB_TYPE).getValue());
 try (final Connection con = dbcpService.getConnection();
  final Statement st = con.createStatement()) {
 
-// Try a query that returns no rows, for the purposes of 
getting metadata about the columns. It is possible
-// to use DatabaseMetaData.getColumns(), but not all 
drivers support this, notably the schema-on-read
-// approach as in Apache Drill
-String query = dbAdapter.getSelectStatement(tableName, 
maxValueColumnNames, "1 = 0", null, null, null);
-ResultSet resultSet = st.executeQuery(query);
-ResultSetMetaData resultSetMetaData = 
resultSet.getMetaData();
-int numCols = resultSetMetaData.getColumnCount();
-if (numCols > 0) {
-if (shouldCleanCache) {
-columnTypeMap.clear();
-}
-for (int i = 1; i <= numCols; i++) {
-String colName = 
resultSetMetaData.getColumnName(i).toLowerCase();
-String colKey = getStateKey(tableName, colName);
-int colType = resultSetMetaData.getColumnType(i);
-columnTypeMap.putIfAbsent(colKey, colType);
+// Try a query that returns no rows, for the purposes of 
getting metadata about the columns. It is possible
+// to use DatabaseMetaData.getColumns(), but not all drivers 
support this, notably the schema-on-read
+// approach as in Apache Drill
+String query;
+
+if(StringUtils.isEmpty(sqlQuery)) {
+query = dbAdapter.getSelectStatement(tableName, 
maxValueColumnNames, "1 = 0", null, null, null);
+} else {
+StringBuilder sbQuery = getWrappedQuery(sqlQuery, 
tableName);
--- End diff --

I agree with you to avoid adding any logic to build SQL statement. How 
about adding one more condition at [AbstractDatabaseFetchProcessor.setup where 
it populates 
columnTypeMap](https://github.com/apache/nifi/blob/90d7926907b87a832407573ce20bd7ac5ba56bf9/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java#L295
), in order to filter out columns those are not specified by 'Max Value 
Columns'? This way, we don't have to modify SQL statement, but can minimize the 
number of columns to be stored in state. How do you think?

Specifically, following lines of code:
```
// This part adds all columns into columnTypeMap for custom query. We want 
to capture maxValueColumns only. The maxValueColumnNameList below can be used 
to do so.
for (int i = 1; i <= numCols; i++) {
String colName = 
resultSetMetaData.getColumnName(i).toLowerCase();
String colKey = getStateKey(tableName, colName);
int colType = resultSetMetaData.getColumnType(i);
columnTypeMap.putIfAbsent(colKey, colType);
}

List maxValueColumnNameList = 
Arrays.asList(maxValueColumnNames.split

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-02-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354859#comment-16354859
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user patricker commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2162#discussion_r166500502
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java
 ---
@@ -249,34 +260,56 @@ public void setup(final ProcessContext context, 
boolean shouldCleanCache, FlowFi
 return;
 }
 
-// Try to fill the columnTypeMap with the types of the desired 
max-value columns
-final DBCPService dbcpService = 
context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
-final String tableName = 
context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
+// Try to fill the columnTypeMap with the types of the desired 
max-value columns
+final DBCPService dbcpService = 
context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
+final String tableName = 
context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
+final String sqlQuery = 
context.getProperty(SQL_QUERY).evaluateAttributeExpressions().getValue();
 
 final DatabaseAdapter dbAdapter = 
dbAdapters.get(context.getProperty(DB_TYPE).getValue());
 try (final Connection con = dbcpService.getConnection();
  final Statement st = con.createStatement()) {
 
-// Try a query that returns no rows, for the purposes of 
getting metadata about the columns. It is possible
-// to use DatabaseMetaData.getColumns(), but not all 
drivers support this, notably the schema-on-read
-// approach as in Apache Drill
-String query = dbAdapter.getSelectStatement(tableName, 
maxValueColumnNames, "1 = 0", null, null, null);
-ResultSet resultSet = st.executeQuery(query);
-ResultSetMetaData resultSetMetaData = 
resultSet.getMetaData();
-int numCols = resultSetMetaData.getColumnCount();
-if (numCols > 0) {
-if (shouldCleanCache) {
-columnTypeMap.clear();
-}
-for (int i = 1; i <= numCols; i++) {
-String colName = 
resultSetMetaData.getColumnName(i).toLowerCase();
-String colKey = getStateKey(tableName, colName);
-int colType = resultSetMetaData.getColumnType(i);
-columnTypeMap.putIfAbsent(colKey, colType);
+// Try a query that returns no rows, for the purposes of 
getting metadata about the columns. It is possible
+// to use DatabaseMetaData.getColumns(), but not all drivers 
support this, notably the schema-on-read
+// approach as in Apache Drill
+String query;
+
+if(StringUtils.isEmpty(sqlQuery)) {
+query = dbAdapter.getSelectStatement(tableName, 
maxValueColumnNames, "1 = 0", null, null, null);
+} else {
+StringBuilder sbQuery = getWrappedQuery(sqlQuery, 
tableName);
--- End diff --

I've checked in fixes for everything except this change. I don't want to 
put in any more SQL building logic than I already have hard coded into QDB. 
What if I added a new method to `DatabaseAdapter` for wrapping a `SELECT` 
statement as a sub query. Input parameters would be similar to the existing 
method for building a SELECT statement; column list, where clause, order by 
clause.


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>Priority: Major
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifyin

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-02-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354854#comment-16354854
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user patricker commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2162#discussion_r166499522
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/QueryDatabaseTable.java
 ---
@@ -150,8 +151,13 @@ public QueryDatabaseTable() {
 final List pds = new ArrayList<>();
 pds.add(DBCP_SERVICE);
 pds.add(DB_TYPE);
-pds.add(TABLE_NAME);
+pds.add(new PropertyDescriptor.Builder()
+.fromPropertyDescriptor(TABLE_NAME)
+.description("The name of the database table to be 
queried. When a custom query is used, this property is used to alias the query 
and appears as an attribute on the FlowFile.")
+.build());
--- End diff --

Good catch.


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>Priority: Major
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-02-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16353875#comment-16353875
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2162#discussion_r166296034
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/QueryDatabaseTable.java
 ---
@@ -150,8 +151,13 @@ public QueryDatabaseTable() {
 final List pds = new ArrayList<>();
 pds.add(DBCP_SERVICE);
 pds.add(DB_TYPE);
-pds.add(TABLE_NAME);
+pds.add(new PropertyDescriptor.Builder()
+.fromPropertyDescriptor(TABLE_NAME)
+.description("The name of the database table to be 
queried. When a custom query is used, this property is used to alias the query 
and appears as an attribute on the FlowFile.")
+.build());
--- End diff --

Please update `AbstractDatabaseFetchProcessor.onPropertyModified` so that 
it clears `setupComplete` flag when this TABLE_NAME property is updated, too. 
Without that, when a custom query is used, `columnTypeMap` will not be 
populated again if the processor is reconfigured with different table name 
alias and restarted, that prevent the maxValueColumn to be captured correctly, 
and produce duplicated query result over and over.


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>Priority: Major
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-02-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16353874#comment-16353874
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2162#discussion_r166280838
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java
 ---
@@ -249,34 +260,56 @@ public void setup(final ProcessContext context, 
boolean shouldCleanCache, FlowFi
 return;
 }
 
-// Try to fill the columnTypeMap with the types of the desired 
max-value columns
-final DBCPService dbcpService = 
context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
-final String tableName = 
context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
+// Try to fill the columnTypeMap with the types of the desired 
max-value columns
+final DBCPService dbcpService = 
context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
+final String tableName = 
context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
+final String sqlQuery = 
context.getProperty(SQL_QUERY).evaluateAttributeExpressions().getValue();
 
 final DatabaseAdapter dbAdapter = 
dbAdapters.get(context.getProperty(DB_TYPE).getValue());
 try (final Connection con = dbcpService.getConnection();
  final Statement st = con.createStatement()) {
 
-// Try a query that returns no rows, for the purposes of 
getting metadata about the columns. It is possible
-// to use DatabaseMetaData.getColumns(), but not all 
drivers support this, notably the schema-on-read
-// approach as in Apache Drill
-String query = dbAdapter.getSelectStatement(tableName, 
maxValueColumnNames, "1 = 0", null, null, null);
-ResultSet resultSet = st.executeQuery(query);
-ResultSetMetaData resultSetMetaData = 
resultSet.getMetaData();
-int numCols = resultSetMetaData.getColumnCount();
-if (numCols > 0) {
-if (shouldCleanCache) {
-columnTypeMap.clear();
-}
-for (int i = 1; i <= numCols; i++) {
-String colName = 
resultSetMetaData.getColumnName(i).toLowerCase();
-String colKey = getStateKey(tableName, colName);
-int colType = resultSetMetaData.getColumnType(i);
-columnTypeMap.putIfAbsent(colKey, colType);
+// Try a query that returns no rows, for the purposes of 
getting metadata about the columns. It is possible
+// to use DatabaseMetaData.getColumns(), but not all drivers 
support this, notably the schema-on-read
+// approach as in Apache Drill
+String query;
+
+if(StringUtils.isEmpty(sqlQuery)) {
+query = dbAdapter.getSelectStatement(tableName, 
maxValueColumnNames, "1 = 0", null, null, null);
+} else {
+StringBuilder sbQuery = getWrappedQuery(sqlQuery, 
tableName);
+sbQuery.append(" WHERE 1=0");
+
+query = sbQuery.toString();
+}
+
+ResultSet resultSet = st.executeQuery(query);
+ResultSetMetaData resultSetMetaData = resultSet.getMetaData();
+int numCols = resultSetMetaData.getColumnCount();
+if (numCols > 0) {
+if (shouldCleanCache){
+columnTypeMap.clear();
+}
+for (int i = 1; i <= numCols; i++) {
+String colName = 
resultSetMetaData.getColumnName(i).toLowerCase();
+String colKey = getStateKey(tableName, colName);
+int colType = resultSetMetaData.getColumnType(i);
+columnTypeMap.putIfAbsent(colKey, colType);
+}
+
+List maxValueColumnNameList = 
org.apache.commons.lang3.StringUtils.isEmpty(maxValueColumnNames)
+? null
+: 
Arrays.asList(maxValueColumnNames.split("\\s*,\\s*"));
--- End diff --

I guess the aim of the regex `\s*,\s` is to split and trim whitespaces. But 
it leaves whitespaces at the head and tail. I'd suggest simply split with `,` 
and trim it in the for loop below before `toLowerCase()`.


> Extend QueryDa

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-02-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16353878#comment-16353878
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2162#discussion_r166294484
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java
 ---
@@ -249,34 +260,56 @@ public void setup(final ProcessContext context, 
boolean shouldCleanCache, FlowFi
 return;
 }
 
-// Try to fill the columnTypeMap with the types of the desired 
max-value columns
-final DBCPService dbcpService = 
context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
-final String tableName = 
context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
+// Try to fill the columnTypeMap with the types of the desired 
max-value columns
+final DBCPService dbcpService = 
context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
+final String tableName = 
context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
+final String sqlQuery = 
context.getProperty(SQL_QUERY).evaluateAttributeExpressions().getValue();
 
 final DatabaseAdapter dbAdapter = 
dbAdapters.get(context.getProperty(DB_TYPE).getValue());
 try (final Connection con = dbcpService.getConnection();
  final Statement st = con.createStatement()) {
 
-// Try a query that returns no rows, for the purposes of 
getting metadata about the columns. It is possible
-// to use DatabaseMetaData.getColumns(), but not all 
drivers support this, notably the schema-on-read
-// approach as in Apache Drill
-String query = dbAdapter.getSelectStatement(tableName, 
maxValueColumnNames, "1 = 0", null, null, null);
-ResultSet resultSet = st.executeQuery(query);
-ResultSetMetaData resultSetMetaData = 
resultSet.getMetaData();
-int numCols = resultSetMetaData.getColumnCount();
-if (numCols > 0) {
-if (shouldCleanCache) {
-columnTypeMap.clear();
-}
-for (int i = 1; i <= numCols; i++) {
-String colName = 
resultSetMetaData.getColumnName(i).toLowerCase();
-String colKey = getStateKey(tableName, colName);
-int colType = resultSetMetaData.getColumnType(i);
-columnTypeMap.putIfAbsent(colKey, colType);
+// Try a query that returns no rows, for the purposes of 
getting metadata about the columns. It is possible
+// to use DatabaseMetaData.getColumns(), but not all drivers 
support this, notably the schema-on-read
+// approach as in Apache Drill
+String query;
+
+if(StringUtils.isEmpty(sqlQuery)) {
+query = dbAdapter.getSelectStatement(tableName, 
maxValueColumnNames, "1 = 0", null, null, null);
+} else {
+StringBuilder sbQuery = getWrappedQuery(sqlQuery, 
tableName);
--- End diff --

Since `getWrappedQuery` fetches all available columns in the sub-query by 
'*', subsequent loop stores all column values into the managed state. We should 
avoid storing unnecessary values into the state by either specifying the 
maxValueColumns to this query, or do some filtering in the loop below.


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>Priority: Major
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-02-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16353873#comment-16353873
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2162#discussion_r166277383
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java
 ---
@@ -249,34 +260,56 @@ public void setup(final ProcessContext context, 
boolean shouldCleanCache, FlowFi
 return;
 }
 
-// Try to fill the columnTypeMap with the types of the desired 
max-value columns
-final DBCPService dbcpService = 
context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
-final String tableName = 
context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
+// Try to fill the columnTypeMap with the types of the desired 
max-value columns
+final DBCPService dbcpService = 
context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
+final String tableName = 
context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
+final String sqlQuery = 
context.getProperty(SQL_QUERY).evaluateAttributeExpressions().getValue();
--- End diff --

Wrong indent.


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>Priority: Major
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-02-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16353877#comment-16353877
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2162#discussion_r166277436
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java
 ---
@@ -249,34 +260,56 @@ public void setup(final ProcessContext context, 
boolean shouldCleanCache, FlowFi
 return;
 }
 
-// Try to fill the columnTypeMap with the types of the desired 
max-value columns
-final DBCPService dbcpService = 
context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
-final String tableName = 
context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
+// Try to fill the columnTypeMap with the types of the desired 
max-value columns
+final DBCPService dbcpService = 
context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
+final String tableName = 
context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
+final String sqlQuery = 
context.getProperty(SQL_QUERY).evaluateAttributeExpressions().getValue();
 
 final DatabaseAdapter dbAdapter = 
dbAdapters.get(context.getProperty(DB_TYPE).getValue());
 try (final Connection con = dbcpService.getConnection();
  final Statement st = con.createStatement()) {
 
-// Try a query that returns no rows, for the purposes of 
getting metadata about the columns. It is possible
-// to use DatabaseMetaData.getColumns(), but not all 
drivers support this, notably the schema-on-read
-// approach as in Apache Drill
-String query = dbAdapter.getSelectStatement(tableName, 
maxValueColumnNames, "1 = 0", null, null, null);
-ResultSet resultSet = st.executeQuery(query);
-ResultSetMetaData resultSetMetaData = 
resultSet.getMetaData();
-int numCols = resultSetMetaData.getColumnCount();
-if (numCols > 0) {
-if (shouldCleanCache) {
-columnTypeMap.clear();
-}
-for (int i = 1; i <= numCols; i++) {
-String colName = 
resultSetMetaData.getColumnName(i).toLowerCase();
-String colKey = getStateKey(tableName, colName);
-int colType = resultSetMetaData.getColumnType(i);
-columnTypeMap.putIfAbsent(colKey, colType);
+// Try a query that returns no rows, for the purposes of 
getting metadata about the columns. It is possible
--- End diff --

Wrong indent.


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>Priority: Major
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-02-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16353876#comment-16353876
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2162#discussion_r166278036
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java
 ---
@@ -249,34 +260,56 @@ public void setup(final ProcessContext context, 
boolean shouldCleanCache, FlowFi
 return;
 }
 
-// Try to fill the columnTypeMap with the types of the desired 
max-value columns
-final DBCPService dbcpService = 
context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
-final String tableName = 
context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
+// Try to fill the columnTypeMap with the types of the desired 
max-value columns
+final DBCPService dbcpService = 
context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
+final String tableName = 
context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
+final String sqlQuery = 
context.getProperty(SQL_QUERY).evaluateAttributeExpressions().getValue();
 
 final DatabaseAdapter dbAdapter = 
dbAdapters.get(context.getProperty(DB_TYPE).getValue());
 try (final Connection con = dbcpService.getConnection();
  final Statement st = con.createStatement()) {
 
-// Try a query that returns no rows, for the purposes of 
getting metadata about the columns. It is possible
-// to use DatabaseMetaData.getColumns(), but not all 
drivers support this, notably the schema-on-read
-// approach as in Apache Drill
-String query = dbAdapter.getSelectStatement(tableName, 
maxValueColumnNames, "1 = 0", null, null, null);
-ResultSet resultSet = st.executeQuery(query);
-ResultSetMetaData resultSetMetaData = 
resultSet.getMetaData();
-int numCols = resultSetMetaData.getColumnCount();
-if (numCols > 0) {
-if (shouldCleanCache) {
-columnTypeMap.clear();
-}
-for (int i = 1; i <= numCols; i++) {
-String colName = 
resultSetMetaData.getColumnName(i).toLowerCase();
-String colKey = getStateKey(tableName, colName);
-int colType = resultSetMetaData.getColumnType(i);
-columnTypeMap.putIfAbsent(colKey, colType);
+// Try a query that returns no rows, for the purposes of 
getting metadata about the columns. It is possible
+// to use DatabaseMetaData.getColumns(), but not all drivers 
support this, notably the schema-on-read
+// approach as in Apache Drill
+String query;
+
+if(StringUtils.isEmpty(sqlQuery)) {
+query = dbAdapter.getSelectStatement(tableName, 
maxValueColumnNames, "1 = 0", null, null, null);
+} else {
+StringBuilder sbQuery = getWrappedQuery(sqlQuery, 
tableName);
+sbQuery.append(" WHERE 1=0");
+
+query = sbQuery.toString();
+}
+
+ResultSet resultSet = st.executeQuery(query);
+ResultSetMetaData resultSetMetaData = resultSet.getMetaData();
+int numCols = resultSetMetaData.getColumnCount();
+if (numCols > 0) {
+if (shouldCleanCache){
+columnTypeMap.clear();
+}
+for (int i = 1; i <= numCols; i++) {
+String colName = 
resultSetMetaData.getColumnName(i).toLowerCase();
+String colKey = getStateKey(tableName, colName);
+int colType = resultSetMetaData.getColumnType(i);
+columnTypeMap.putIfAbsent(colKey, colType);
+}
+
+List maxValueColumnNameList = 
org.apache.commons.lang3.StringUtils.isEmpty(maxValueColumnNames)
--- End diff --

I think we can use `org.apache.nifi.util.StringUtils` here instead, which 
is already imported. Moreover, we can remove this emptiness check because it's 
already checked at the beginning of this method.


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-02-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352075#comment-16352075
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user patricker commented on the issue:

https://github.com/apache/nifi/pull/2162
  
@ijokarumawak Updated. 


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>Priority: Major
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-01-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336907#comment-16336907
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user patricker commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2162#discussion_r163454074
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java
 ---
@@ -156,10 +157,22 @@
 .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
 .build();
 
+public static final PropertyDescriptor SQL_QUERY = new 
PropertyDescriptor.Builder()
+.name("db-fetch-sql-query")
+.displayName("Custom Query")
+.description("A custom SQL query used to retrieve data. 
Instead of building a SQL query from "
++ "other properties, this query will be used. Query 
must have no WHERE or ORDER BY statements. "
++ "If a WHERE clause is needed use a sub-query or the 
'Additional WHERE clause' property.")
--- End diff --

Sounds like a good idea. Users still won't be able to include an ORDER BY, 
as those are not allowed in sub queries in every database I've worked with.


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>Priority: Major
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-01-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336812#comment-16336812
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2162#discussion_r163441205
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/QueryDatabaseTable.java
 ---
@@ -366,10 +401,22 @@ public void onTrigger(final ProcessContext context, 
final ProcessSessionFactory
 
 protected String getQuery(DatabaseAdapter dbAdapter, String tableName, 
String columnNames, List maxValColumnNames,
   String customWhereClause, Map stateMap) {
+
+return getQuery(dbAdapter, tableName, "", columnNames, 
maxValColumnNames, customWhereClause, stateMap);
--- End diff --

`null` may be preferable over the empty string object for `sqlQuery`.


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>Priority: Major
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-01-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336813#comment-16336813
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2162#discussion_r163442381
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java
 ---
@@ -156,10 +157,22 @@
 .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
 .build();
 
+public static final PropertyDescriptor SQL_QUERY = new 
PropertyDescriptor.Builder()
+.name("db-fetch-sql-query")
+.displayName("Custom Query")
+.description("A custom SQL query used to retrieve data. 
Instead of building a SQL query from "
++ "other properties, this query will be used. Query 
must have no WHERE or ORDER BY statements. "
++ "If a WHERE clause is needed use a sub-query or the 
'Additional WHERE clause' property.")
--- End diff --

Can we generate a sub-query from this property by simply wrapping it with 
braces and add the specified `Table name` as the alias name of it? If we do so, 
we can use Expression Language and incoming FlowFiles more creatively with the 
single QueryDatabaseTable processor instance. Also it will allow user to define 
WHERE and ORDER BY statement here.

With that approach, we can keep using the `Table Name` property as it is, 
and the custom validation is not required. This `Custom Query` property can be 
truly an optional property.

Current proposed approach use a constant "NiFi_QDBT" table name for state 
keys, and it can limit the use-case. How do you think?


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>Priority: Major
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-01-23 Thread Koji Kawamura (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336788#comment-16336788
 ] 

Koji Kawamura commented on NIFI-1706:
-

Thanks [~patricker]. I thought about that situation where NiFi admin is not 
allowed to create VIEWs, too. By looking at the proposed change, and other 
alternatives you listed in the previous comment. I agree with you, extending 
QueryDatabaseTable looks the way to go. I will try to find a time to review the 
change.

> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>Priority: Major
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-01-23 Thread Peter Wicks (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336746#comment-16336746
 ] 

Peter Wicks commented on NIFI-1706:
---

[~ijokarumawak] I've written many views for use with QueryDatabaseTable, and 
it's my preferred approach.

The use case for allowing custom SQL is more about permissions. In some cases I 
have read-only permission to the server/database, and the group that owns the 
system is not willing to allow me to create views; generally it's a combination 
of they don't have time to support me + every other team that writes reports 
(MS SQL Reporting Services, Tableau, etc...) is successful with their currently 
structure, so why can't I be?

There are alternatives to adding this functionality to QueryDatabaseTable:
 * Create a brand new processor that is a cross between QueryDatabaseTable and 
ExecuteSQL
 * Extend ExecuteSQL to include optional state tracking features. It could be a 
more manual process where users have to provide initial max values and then 
insert the max values into the query by hand using EL.

> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>Priority: Major
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-01-22 Thread Koji Kawamura (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334006#comment-16334006
 ] 

Koji Kawamura commented on NIFI-1706:
-

[~pee...@gmail.com] You can define select queries in the target RDBMS as VIEW. 
This is an example of MySQL, you can use JOIN in VIEW to provide rich data. 
https://stackoverflow.com/questions/12352048/mysql-create-view-joining-two-tables

> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>Priority: Major
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-01-22 Thread Paul Bormans (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333992#comment-16333992
 ] 

Paul Bormans commented on NIFI-1706:


Hello Koji,

I'm not familiar with the Views concept (have been "out" for some some since i 
wrote the ticket), but i still feel that the processor is not very useful if 
the aim is to extract "rich" data from an rdbms without support for a more 
complex query. The only alternative that you then have is to extract on 
per-table basis and join somewhere else...

Paul

 

> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>Priority: Major
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2018-01-21 Thread Koji Kawamura (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333933#comment-16333933
 ] 

Koji Kawamura commented on NIFI-1706:
-

Hi [~pee...@gmail.com] [~patricker],

I found a similar JIRA NIFI-2200 before I found this one. I closed NIFI-2200 as 
I believe VIEWs can replace sub-query needs. VIEWs can cover the improvements 
requested by this JIRA, too. I personally prefer to keep a processor 
implementation as simple as possible for sustainable maintainability. I 
appreciate Peter's work but wanted to discuss. How do you think? Thank you.

> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>Priority: Major
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2017-10-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211993#comment-16211993
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user mattyb149 commented on the issue:

https://github.com/apache/nifi/pull/2162
  
Yes, sorry for the delay, got swamped 


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2017-10-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211979#comment-16211979
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user patricker commented on the issue:

https://github.com/apache/nifi/pull/2162
  
@mattyb149 Can you take another look? I think I've addressed all of your 
concerns.


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2017-10-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198363#comment-16198363
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user patricker commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2162#discussion_r143664137
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java
 ---
@@ -240,7 +254,14 @@ public void setup(final ProcessContext context) {
 // Try a query that returns no rows, for the purposes of 
getting metadata about the columns. It is possible
 // to use DatabaseMetaData.getColumns(), but not all drivers 
support this, notably the schema-on-read
 // approach as in Apache Drill
-String query = dbAdapter.getSelectStatement(tableName, 
maxValueColumnNames, "1 = 0", null, null, null);
+String query;
+
+if(StringUtils.isEmpty(sqlQuery)) {
+query = dbAdapter.getSelectStatement(tableName, 
maxValueColumnNames, "1 = 0", null, null, null);
+} else {
+query=sqlQuery + " WHERE 1=0";
--- End diff --

@mattyb149 

> If they specify a max-value column in the other property, and it is not 
available in this query, then the getSelectStatement() below doesn't seem like 
it would work as expected

What about if I take the list of maxValueColumns and use those to build the 
equivelant of the "1=0" condition. For example, if our max value column names 
are `x` and `y` I could build a where expression `x <> x AND y <> y`. This 
would ensure the column names were present, and if they aren't an exception 
would be thrown.


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2017-10-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198357#comment-16198357
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user patricker commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2162#discussion_r143663025
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/QueryDatabaseTable.java
 ---
@@ -190,8 +222,11 @@ public void onTrigger(final ProcessContext context, 
final ProcessSessionFactory
 
 final DBCPService dbcpService = 
context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
 final DatabaseAdapter dbAdapter = 
dbAdapters.get(context.getProperty(DB_TYPE).getValue());
-final String tableName = 
context.getProperty(TABLE_NAME).evaluateAttributeExpressions().getValue();
+
+final String propTableName = 
context.getProperty(TABLE_NAME).evaluateAttributeExpressions().getValue();
+final String tableName = StringUtils.isEmpty(propTableName) ? 
ARBITRARY_SQL_TABLE_NAME : propTableName;
 final String columnNames = 
context.getProperty(COLUMN_NAMES).evaluateAttributeExpressions().getValue();
+final String sqlQuery = context.getProperty(SQL_QUERY).getValue();
--- End diff --

Makes sense to me.


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2017-10-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198347#comment-16198347
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user patricker commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2162#discussion_r143661252
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java
 ---
@@ -231,7 +243,9 @@ public void setup(final ProcessContext context) {
 
 // Try to fill the columnTypeMap with the types of the desired 
max-value columns
 final DBCPService dbcpService = 
context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
-final String tableName = 
context.getProperty(TABLE_NAME).evaluateAttributeExpressions().getValue();
+final String propTableName = 
context.getProperty(TABLE_NAME).evaluateAttributeExpressions().getValue();
+final String tableName = 
org.apache.commons.lang3.StringUtils.isEmpty(propTableName) ? 
ARBITRARY_SQL_TABLE_NAME : propTableName;
--- End diff --

Just some copy/paste action from another chunk of code.


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2017-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195219#comment-16195219
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2162#discussion_r143279372
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/QueryDatabaseTable.java
 ---
@@ -175,6 +181,32 @@ public QueryDatabaseTable() {
 return propDescriptors;
 }
 
+@Override
+protected Collection 
customValidate(ValidationContext validationContext) {
+final List results = new 
ArrayList<>(super.customValidate(validationContext));
+
+final String tableName = 
validationContext.getProperty(TABLE_NAME).getValue();
+final String sqlQuery = 
validationContext.getProperty(SQL_QUERY).getValue();
+
+if(!StringUtils.isEmpty(sqlQuery) && 
!StringUtils.isEmpty(tableName)){
+results.add(new ValidationResult.Builder()
+.valid(false)
+.subject("SQL Query")
+.explanation("SQL Query and Table Name can't both 
be specified at the same time.")
--- End diff --

This might need some elaboration, in case the user wouldn't necessarily 
understand why they can't both be specified (due to EL support in the Table 
Name property, e.g.)


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2017-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195220#comment-16195220
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2162#discussion_r143280289
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/QueryDatabaseTable.java
 ---
@@ -175,6 +181,32 @@ public QueryDatabaseTable() {
 return propDescriptors;
 }
 
+@Override
+protected Collection 
customValidate(ValidationContext validationContext) {
+final List results = new 
ArrayList<>(super.customValidate(validationContext));
+
+final String tableName = 
validationContext.getProperty(TABLE_NAME).getValue();
+final String sqlQuery = 
validationContext.getProperty(SQL_QUERY).getValue();
+
+if(!StringUtils.isEmpty(sqlQuery) && 
!StringUtils.isEmpty(tableName)){
+results.add(new ValidationResult.Builder()
+.valid(false)
+.subject("SQL Query")
--- End diff --

This should match either the "Arbitrary/Custom Query" or "Table Name" 
property or both I think


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2017-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195218#comment-16195218
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2162#discussion_r143290546
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/QueryDatabaseTable.java
 ---
@@ -175,6 +181,32 @@ public QueryDatabaseTable() {
 return propDescriptors;
 }
 
+@Override
+protected Collection 
customValidate(ValidationContext validationContext) {
+final List results = new 
ArrayList<>(super.customValidate(validationContext));
+
+final String tableName = 
validationContext.getProperty(TABLE_NAME).getValue();
+final String sqlQuery = 
validationContext.getProperty(SQL_QUERY).getValue();
+
+if(!StringUtils.isEmpty(sqlQuery) && 
!StringUtils.isEmpty(tableName)){
+results.add(new ValidationResult.Builder()
+.valid(false)
+.subject("SQL Query")
+.explanation("SQL Query and Table Name can't both 
be specified at the same time.")
+.build());
+}
+
+if(!StringUtils.isEmpty(sqlQuery) && isDynamicMaxValues){
--- End diff --

Is this a hard requirement? Expression Language support (since incoming 
connections are not allowed for QueryDatabaseTable) is meant to support 
Variable Registry and environment variables, which could be used in a custom 
query if it also included such an EL statement?


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2017-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195216#comment-16195216
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2162#discussion_r143289913
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/QueryDatabaseTable.java
 ---
@@ -77,7 +80,8 @@
 @InputRequirement(Requirement.INPUT_FORBIDDEN)
 @Tags({"sql", "select", "jdbc", "query", "database"})
 @SeeAlso({GenerateTableFetch.class, ExecuteSQL.class})
-@CapabilityDescription("Generates and executes a SQL select query to fetch 
all rows whose values in the specified Maximum Value column(s) are larger than 
the "
+@CapabilityDescription("Generates a SQL select query, or  uses a provided 
statement, and executes it to fetch all rows whose values in the specified "
--- End diff --

Nitpick, but an extra space between "or  uses"


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2017-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195215#comment-16195215
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2162#discussion_r143289820
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java
 ---
@@ -231,7 +243,9 @@ public void setup(final ProcessContext context) {
 
 // Try to fill the columnTypeMap with the types of the desired 
max-value columns
 final DBCPService dbcpService = 
context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
-final String tableName = 
context.getProperty(TABLE_NAME).evaluateAttributeExpressions().getValue();
+final String propTableName = 
context.getProperty(TABLE_NAME).evaluateAttributeExpressions().getValue();
+final String tableName = 
org.apache.commons.lang3.StringUtils.isEmpty(propTableName) ? 
ARBITRARY_SQL_TABLE_NAME : propTableName;
--- End diff --

Why the fully-qualified StringUtils class? If we have both (NiFi and 
Commons Lang), can we get rid of one?


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2017-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195217#comment-16195217
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2162#discussion_r143290669
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/QueryDatabaseTable.java
 ---
@@ -190,8 +222,11 @@ public void onTrigger(final ProcessContext context, 
final ProcessSessionFactory
 
 final DBCPService dbcpService = 
context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
 final DatabaseAdapter dbAdapter = 
dbAdapters.get(context.getProperty(DB_TYPE).getValue());
-final String tableName = 
context.getProperty(TABLE_NAME).evaluateAttributeExpressions().getValue();
+
+final String propTableName = 
context.getProperty(TABLE_NAME).evaluateAttributeExpressions().getValue();
+final String tableName = StringUtils.isEmpty(propTableName) ? 
ARBITRARY_SQL_TABLE_NAME : propTableName;
 final String columnNames = 
context.getProperty(COLUMN_NAMES).evaluateAttributeExpressions().getValue();
+final String sqlQuery = context.getProperty(SQL_QUERY).getValue();
--- End diff --

Should this support Expression Language for Variable Registry support (see 
my comment above)?


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2017-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195214#comment-16195214
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2162#discussion_r143279020
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java
 ---
@@ -155,10 +155,22 @@
 .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
 .build();
 
+public static final PropertyDescriptor SQL_QUERY = new 
PropertyDescriptor.Builder()
+.name("db-fetch-sql-query")
+.displayName("Arbitrary Query")
--- End diff --

Possibly call this Custom Query? Arbitrary is... well arbitrary :P and 
might cause some confusion for the user. Plus the description refers to it as a 
custom query


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2017-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195213#comment-16195213
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2162#discussion_r143278853
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java
 ---
@@ -155,10 +155,22 @@
 .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
 .build();
 
+public static final PropertyDescriptor SQL_QUERY = new 
PropertyDescriptor.Builder()
+.name("db-fetch-sql-query")
+.displayName("Arbitrary Query")
+.description("A custom SQL query used to retrieve data. 
Instead of building a SQL query from "
++ "other properties, this query will be used. Query 
must have no WHERE or ORDER BY statements. "
++ "If a WHERE clause is needed use a sub-query or the 
'Additional WHERE clause' property.")
--- End diff --

Do we need more doc here around max-value columns? If they specify a 
max-value column in the other property, and it is not available in this query, 
then the getSelectStatement() below doesn't seem like it would work as expected


> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2017-09-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171151#comment-16171151
 ] 

ASF GitHub Bot commented on NIFI-1706:
--

GitHub user patricker opened a pull request:

https://github.com/apache/nifi/pull/2162

NIFI-1706 Extend QueryDatabaseTable to support arbitrary queries



### For all changes:
- [x] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [x] Does your PR title start with NIFI- where  is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.

- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [x] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
- [x] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [x] If adding new Properties, have you added .displayName in addition to 
.name (programmatic access) for each of the new properties?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/patricker/nifi NIFI-1706

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/2162.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2162


commit 5169001108051c395be8cee1a08c7dc1fbf1b0f5
Author: patricker 
Date:   2017-09-19T05:50:06Z

NIFI-1706




> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (NIFI-1706) Extend QueryDatabaseTable to support arbitrary queries

2017-09-18 Thread Peter Wicks (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16169771#comment-16169771
 ] 

Peter Wicks commented on NIFI-1706:
---

I have this written up with test cases, just working through some more hands on 
use case testing.

> Extend QueryDatabaseTable to support arbitrary queries
> --
>
> Key: NIFI-1706
> URL: https://issues.apache.org/jira/browse/NIFI-1706
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Paul Bormans
>Assignee: Peter Wicks
>  Labels: features
>
> The QueryDatabaseTable is able to observe a configured database table for new 
> rows and yield these into the flowfile. The model of an rdbms however is 
> often (if not always) normalized so you would need to join various tables in 
> order to "flatten" the data into useful events for a processing pipeline as 
> can be build with nifi or various tools within the hadoop ecosystem.
> The request is to extend the processor to specify an arbitrary sql query 
> instead of specifying the table name + columns.
> In addition (this may be another issue?) it is desired to limit the number of 
> rows returned per run. Not just because of bandwidth issue's from the nifi 
> pipeline onwards but mainly because huge databases may not be able to return 
> so many records within a reasonable time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

43 matches

Mail list logo