[GitHub] nifi pull request #2695: NIFI-5044 SelectHiveQL accept only one statement

2018-06-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/2695


---


[GitHub] nifi pull request #2695: NIFI-5044 SelectHiveQL accept only one statement

2018-06-10 Thread mattyb149
Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2695#discussion_r194283856
  
--- Diff: 
nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/SelectHiveQL.java
 ---
@@ -443,31 +481,72 @@ private void onTrigger(final ProcessContext context, 
final ProcessSession sessio
 throw e;
 }
 
-session.transfer(resultSetFlowFiles, REL_SUCCESS);
+failure = executeConfigStatements(con, postQueries);
+if (failure != null) {
+hqlStatement = failure.getLeft();
+if (resultSetFlowFiles != null) {
+resultSetFlowFiles.forEach(ff -> session.remove(ff));
+}
+flowfile = (fileToProcess == null) ? session.create() : 
fileToProcess;
+fileToProcess = null;
+throw failure.getRight();
+}
 
+session.transfer(resultSetFlowFiles, REL_SUCCESS);
+if (fileToProcess != null) {
+session.remove(fileToProcess);
+}
 } catch (final ProcessException | SQLException e) {
-logger.error("Issue processing SQL {} due to {}.", new 
Object[]{selectQuery, e});
+logger.error("Issue processing SQL {} due to {}.", new 
Object[]{hqlStatement, e});
 if (flowfile == null) {
 // This can happen if any exceptions occur while setting 
up the connection, statement, etc.
 logger.error("Unable to execute HiveQL select query {} due 
to {}. No FlowFile to route to failure",
-new Object[]{selectQuery, e});
+new Object[]{hqlStatement, e});
 context.yield();
 } else {
 if (context.hasIncomingConnection()) {
 logger.error("Unable to execute HiveQL select query {} 
for {} due to {}; routing to failure",
-new Object[]{selectQuery, flowfile, e});
+new Object[]{hqlStatement, flowfile, e});
 flowfile = session.penalize(flowfile);
 } else {
 logger.error("Unable to execute HiveQL select query {} 
due to {}; routing to failure",
-new Object[]{selectQuery, e});
+new Object[]{hqlStatement, e});
 context.yield();
 }
 session.transfer(flowfile, REL_FAILURE);
 }
-} finally {
-if (fileToProcess != null) {
-session.remove(fileToProcess);
+}
+}
+
+/*
+ * Executes given queries using pre-defined connection.
+ * Returns null on success, or a query string if failed.
+ */
+protected Pair executeConfigStatements(final 
Connection con, final List configQueries){
+if (configQueries == null || configQueries.isEmpty()) {
+return null;
+}
+
+for (String confSQL : configQueries) {
+try(final Statement st = con.createStatement()){
+st.executeQuery(confSQL);
--- End diff --

Shouldn't this be st.execute()? When I run with a single "set" statement I 
get a SQLException that the query did not return a result set...


---


[GitHub] nifi pull request #2695: NIFI-5044 SelectHiveQL accept only one statement

2018-06-08 Thread mattyb149
Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2695#discussion_r194183320
  
--- Diff: 
nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/test/java/org/apache/nifi/processors/hive/TestSelectHiveQL.java
 ---
@@ -198,6 +200,51 @@ public void testWithSqlException() throws SQLException 
{
 runner.assertAllFlowFilesTransferred(SelectHiveQL.REL_FAILURE, 1);
 }
 
+@Test
+public void invokeOnTriggerExceptionInPreQieriesNoIncomingFlows()
+throws InitializationException, ClassNotFoundException, 
SQLException, IOException {
+
+doOnTrigger(QUERY_WITHOUT_EL, false, CSV,
+"select 'no exception' from persons; select exception from 
persons",
+null);
+
+runner.assertAllFlowFilesTransferred(SelectHiveQL.REL_FAILURE, 1);
--- End diff --

I must have been thinking of some other processor or perhaps the behavior 
had changed at some point, sorry about that. As long as it behaves the same way 
it used to with respect to where/if FFs get transferred, then I'm good :)


---


[GitHub] nifi pull request #2695: NIFI-5044 SelectHiveQL accept only one statement

2018-05-24 Thread bdesert
Github user bdesert commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2695#discussion_r190653106
  
--- Diff: 
nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/test/java/org/apache/nifi/processors/hive/TestSelectHiveQL.java
 ---
@@ -198,6 +200,51 @@ public void testWithSqlException() throws SQLException 
{
 runner.assertAllFlowFilesTransferred(SelectHiveQL.REL_FAILURE, 1);
 }
 
+@Test
+public void invokeOnTriggerExceptionInPreQieriesNoIncomingFlows()
+throws InitializationException, ClassNotFoundException, 
SQLException, IOException {
+
+doOnTrigger(QUERY_WITHOUT_EL, false, CSV,
+"select 'no exception' from persons; select exception from 
persons",
+null);
+
+runner.assertAllFlowFilesTransferred(SelectHiveQL.REL_FAILURE, 1);
--- End diff --

>In general the behavior should remain the same whenever possible

Currently, if there is an error in SQL Query - it will go to "failure" 
relationship (even if there are no incoming connections)

![image](https://user-images.githubusercontent.com/19496093/40499024-ae42cbec-5f4e-11e8-9ff0-348dd6793b2a.png)

![image](https://user-images.githubusercontent.com/19496093/40498559-68c064c2-5f4d-11e8-9a18-88fcb082b18e.png)

So, I follow current error handling strategy. It's just wasn't accurate 
about:

> since we weren't issuing a flow file on failure before

because we do issue FF on failure (on establishing connection it is 
different though, and not impacted by this change).

@mattyb149 , Any word on this?


---


[GitHub] nifi pull request #2695: NIFI-5044 SelectHiveQL accept only one statement

2018-05-24 Thread pvillard31
Github user pvillard31 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2695#discussion_r190637767
  
--- Diff: 
nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/test/java/org/apache/nifi/processors/hive/TestSelectHiveQL.java
 ---
@@ -198,6 +200,51 @@ public void testWithSqlException() throws SQLException 
{
 runner.assertAllFlowFilesTransferred(SelectHiveQL.REL_FAILURE, 1);
 }
 
+@Test
+public void invokeOnTriggerExceptionInPreQieriesNoIncomingFlows()
+throws InitializationException, ClassNotFoundException, 
SQLException, IOException {
+
+doOnTrigger(QUERY_WITHOUT_EL, false, CSV,
+"select 'no exception' from persons; select exception from 
persons",
+null);
+
+runner.assertAllFlowFilesTransferred(SelectHiveQL.REL_FAILURE, 1);
+}
+
+@Test
+public void invokeOnTriggerExceptionInPreQieriesWithIncomingFlows()
+throws InitializationException, ClassNotFoundException, 
SQLException, IOException {
+
+doOnTrigger(QUERY_WITHOUT_EL, true, CSV,
+"select 'no exception' from persons; select exception from 
persons",
+null);
+
+runner.assertAllFlowFilesTransferred(SelectHiveQL.REL_FAILURE, 1);
+}
+
+@Test
+public void invokeOnTriggerExceptionInPostQieriesNoIncomingFlows()
+throws InitializationException, ClassNotFoundException, 
SQLException, IOException {
+
+doOnTrigger(QUERY_WITHOUT_EL, false, CSV,
+null,
+"select 'no exception' from persons; select exception from 
persons");
+
+runner.assertAllFlowFilesTransferred(SelectHiveQL.REL_FAILURE, 1);
--- End diff --

Same here


---


[GitHub] nifi pull request #2695: NIFI-5044 SelectHiveQL accept only one statement

2018-05-24 Thread pvillard31
Github user pvillard31 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2695#discussion_r190633734
  
--- Diff: 
nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/SelectHiveQL.java
 ---
@@ -113,6 +126,17 @@
 
.expressionLanguageSupported(ExpressionLanguageScope.FLOWFILE_ATTRIBUTES)
 .build();
 
+public static final PropertyDescriptor HIVEQL_POST_QUERY = new 
PropertyDescriptor.Builder()
+.name("hive-post-query")
+.displayName("HiveQL Post-Query")
+.description("HiveQL post-query to execute. 
Semicolon-delimited list of queries. "
++ "Example: 'set tez.queue.name=default; set 
hive.exec.orc.split.strategy=HYBRID; set 
hive.exec.reducers.bytes.per.reducer=258998272'. "
++ "Note, the results/outputs of these queries will be 
suppressed if successful executed.")
+.required(false)
+.addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
--- End diff --

Yeah, tbh, I'm really not sure it's worth the effort. I'm fine with current 
approach.


---


[GitHub] nifi pull request #2695: NIFI-5044 SelectHiveQL accept only one statement

2018-05-24 Thread pvillard31
Github user pvillard31 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2695#discussion_r190633475
  
--- Diff: 
nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/SelectHiveQL.java
 ---
@@ -113,6 +126,17 @@
 
.expressionLanguageSupported(ExpressionLanguageScope.FLOWFILE_ATTRIBUTES)
 .build();
 
+public static final PropertyDescriptor HIVEQL_POST_QUERY = new 
PropertyDescriptor.Builder()
+.name("hive-post-query")
+.displayName("HiveQL Post-Query")
+.description("HiveQL post-query to execute. 
Semicolon-delimited list of queries. "
++ "Example: 'set tez.queue.name=default; set 
hive.exec.orc.split.strategy=HYBRID; set 
hive.exec.reducers.bytes.per.reducer=258998272'. "
--- End diff --

don't have anything in mind :) maybe just removing the examples to be sure 
it's not misleading somehow?


---


[GitHub] nifi pull request #2695: NIFI-5044 SelectHiveQL accept only one statement

2018-05-24 Thread pvillard31
Github user pvillard31 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2695#discussion_r190637825
  
--- Diff: 
nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/test/java/org/apache/nifi/processors/hive/TestSelectHiveQL.java
 ---
@@ -198,6 +200,51 @@ public void testWithSqlException() throws SQLException 
{
 runner.assertAllFlowFilesTransferred(SelectHiveQL.REL_FAILURE, 1);
 }
 
+@Test
+public void invokeOnTriggerExceptionInPreQieriesNoIncomingFlows()
+throws InitializationException, ClassNotFoundException, 
SQLException, IOException {
+
+doOnTrigger(QUERY_WITHOUT_EL, false, CSV,
+"select 'no exception' from persons; select exception from 
persons",
+null);
+
+runner.assertAllFlowFilesTransferred(SelectHiveQL.REL_FAILURE, 1);
--- End diff --

Based on the comment from @mattyb149 in the JIRA:
> In general the behavior should remain the same whenever possible, so if 
you have a SelectHiveQL that doesn't have incoming (non-loop) connections, then 
the query must be supplied, and whether it (or the pre-post queries) have EL in 
them, then since we weren't issuing a flow file on failure before, we shouldn't 
now either. So when I said "Route to Failure with penalization for everything 
else", that's only when there is a flow file to route. If there isn't, then we 
should yield (and remove any created flow files and/or rollback our session 
anyway).

Not sure to understand the assertion here since there is no incoming ff, 
right?


---


[GitHub] nifi pull request #2695: NIFI-5044 SelectHiveQL accept only one statement

2018-05-17 Thread bdesert
Github user bdesert commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2695#discussion_r189075604
  
--- Diff: 
nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/SelectHiveQL.java
 ---
@@ -311,7 +340,15 @@ private void onTrigger(final ProcessContext context, 
final ProcessSession sessio
 try (final Connection con = 
dbcpService.getConnection(fileToProcess == null ? Collections.emptyMap() : 
fileToProcess.getAttributes());
  final Statement st = (flowbased ? 
con.prepareStatement(selectQuery) : con.createStatement())
 ) {
-
+Pair failure = 
executeConfigStatements(con, preQueries);
+if (failure != null) {
+// In case of failure, assigning config query to 
"selectQuery" var will allow to avoid major changes in error handling (catch 
block),
--- End diff --

@pvillard31 , I think if I just rename this var into "hqlStatement" - that 
will remove confusions and make reusing it for pre- and post- queries natural. 
Going to push updates for this.


---


[GitHub] nifi pull request #2695: NIFI-5044 SelectHiveQL accept only one statement

2018-05-17 Thread bdesert
Github user bdesert commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2695#discussion_r189008316
  
--- Diff: 
nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/SelectHiveQL.java
 ---
@@ -113,6 +126,17 @@
 
.expressionLanguageSupported(ExpressionLanguageScope.FLOWFILE_ATTRIBUTES)
 .build();
 
+public static final PropertyDescriptor HIVEQL_POST_QUERY = new 
PropertyDescriptor.Builder()
+.name("hive-post-query")
+.displayName("HiveQL Post-Query")
+.description("HiveQL post-query to execute. 
Semicolon-delimited list of queries. "
++ "Example: 'set tez.queue.name=default; set 
hive.exec.orc.split.strategy=HYBRID; set 
hive.exec.reducers.bytes.per.reducer=258998272'. "
++ "Note, the results/outputs of these queries will be 
suppressed if successful executed.")
+.required(false)
+.addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
--- End diff --

Well, would be interesting to have a parser here, but then it should be 
global, and include also main query ("select" statement). I believe it can be 
done, but out of scope for this change. We can have another Improvement request 
in Jira to have new HiveQL validator for all Hive related processors.


---


[GitHub] nifi pull request #2695: NIFI-5044 SelectHiveQL accept only one statement

2018-05-17 Thread bdesert
Github user bdesert commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2695#discussion_r189005203
  
--- Diff: 
nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/SelectHiveQL.java
 ---
@@ -113,6 +126,17 @@
 
.expressionLanguageSupported(ExpressionLanguageScope.FLOWFILE_ATTRIBUTES)
 .build();
 
+public static final PropertyDescriptor HIVEQL_POST_QUERY = new 
PropertyDescriptor.Builder()
+.name("hive-post-query")
+.displayName("HiveQL Post-Query")
+.description("HiveQL post-query to execute. 
Semicolon-delimited list of queries. "
++ "Example: 'set tez.queue.name=default; set 
hive.exec.orc.split.strategy=HYBRID; set 
hive.exec.reducers.bytes.per.reducer=258998272'. "
--- End diff --

In my opinion - there shouldn't be "post" queries at all, as after "select" 
it doesn't make sense to run anything. But based on discussion, there is a 
demand for "post-queries", that's why we implement.  I'll be happy to get any 
suggestion on what could be a good example for post-query :)


---


[GitHub] nifi pull request #2695: NIFI-5044 SelectHiveQL accept only one statement

2018-05-17 Thread pvillard31
Github user pvillard31 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2695#discussion_r188990055
  
--- Diff: 
nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/SelectHiveQL.java
 ---
@@ -311,7 +340,15 @@ private void onTrigger(final ProcessContext context, 
final ProcessSession sessio
 try (final Connection con = 
dbcpService.getConnection(fileToProcess == null ? Collections.emptyMap() : 
fileToProcess.getAttributes());
  final Statement st = (flowbased ? 
con.prepareStatement(selectQuery) : con.createStatement())
 ) {
-
+Pair failure = 
executeConfigStatements(con, preQueries);
+if (failure != null) {
+// In case of failure, assigning config query to 
"selectQuery" var will allow to avoid major changes in error handling (catch 
block),
--- End diff --

I understand the intent here, just wondering if it's a good idea on the 
long term for code maintainability/clarity. @mattyb149 what are your thoughts?


---


[GitHub] nifi pull request #2695: NIFI-5044 SelectHiveQL accept only one statement

2018-05-17 Thread pvillard31
Github user pvillard31 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2695#discussion_r18896
  
--- Diff: 
nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/SelectHiveQL.java
 ---
@@ -113,6 +126,17 @@
 
.expressionLanguageSupported(ExpressionLanguageScope.FLOWFILE_ATTRIBUTES)
 .build();
 
+public static final PropertyDescriptor HIVEQL_POST_QUERY = new 
PropertyDescriptor.Builder()
+.name("hive-post-query")
+.displayName("HiveQL Post-Query")
+.description("HiveQL post-query to execute. 
Semicolon-delimited list of queries. "
++ "Example: 'set tez.queue.name=default; set 
hive.exec.orc.split.strategy=HYBRID; set 
hive.exec.reducers.bytes.per.reducer=258998272'. "
++ "Note, the results/outputs of these queries will be 
suppressed if successful executed.")
+.required(false)
+.addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
--- End diff --

Could it make sense to have a custom validator here just to be sure the 
statements will be accepted by the driver? (would need to check if EL is used 
or not first). Not sure it's worth the trouble though...


---


[GitHub] nifi pull request #2695: NIFI-5044 SelectHiveQL accept only one statement

2018-05-17 Thread pvillard31
Github user pvillard31 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2695#discussion_r188985931
  
--- Diff: 
nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/SelectHiveQL.java
 ---
@@ -113,6 +126,17 @@
 
.expressionLanguageSupported(ExpressionLanguageScope.FLOWFILE_ATTRIBUTES)
 .build();
 
+public static final PropertyDescriptor HIVEQL_POST_QUERY = new 
PropertyDescriptor.Builder()
+.name("hive-post-query")
+.displayName("HiveQL Post-Query")
+.description("HiveQL post-query to execute. 
Semicolon-delimited list of queries. "
++ "Example: 'set tez.queue.name=default; set 
hive.exec.orc.split.strategy=HYBRID; set 
hive.exec.reducers.bytes.per.reducer=258998272'. "
--- End diff --

I'm clearly not a Hive expert but does it make sense to give this kind of 
examples for post-queries?


---


[GitHub] nifi pull request #2695: NIFI-5044 SelectHiveQL accept only one statement

2018-05-12 Thread bdesert
GitHub user bdesert opened a pull request:

https://github.com/apache/nifi/pull/2695

NIFI-5044 SelectHiveQL accept only one statement

SelectHiveQL support only single SELECT statement.
This change adds support for pre- and post- select statements.
It will be useful for configuration queries, i.e. "set 
tez.queue.name=default", and others.

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [X] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [X] Does your PR title start with NIFI- where  is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.

- [X] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [X] Is your initial contribution a single, squashed commit?

### For code changes:
- [X] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
- [X] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file under nifi-assembly?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-assembly?
- [X] If adding new Properties, have you added .displayName in addition to 
.name (programmatic access) for each of the new properties?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bdesert/nifi NIFI-5044_SelectHiveQL_

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/2695.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2695


commit 5e4c2b00405418fc4673e851740129e94f59caab
Author: Ed B 
Date:   2018-05-13T05:24:09Z

NIFI-5044 SelectHiveQL accept only one statement

SelectHiveQL support only single SELECT statement.
This change adds support for pre- and post- select statements.
It will be useful for configuration queries, i.e. "set 
tez.queue.name=default", and others.




---