RE: SelectHiveQL Error

2016-10-09 Thread Nathamuni, Ramanujam
Hi Matt,

Approach looks good and I may suggest to use similar approach to all Hadoop 
related processor too but I am not sure how hard or easy to group all the 
Hadoop related connector  to use the similar LIBPATH and CONF_PATH variables.

Thanks,
Ram 

-Original Message-
From: Matt Burgess [mailto:mattyb...@apache.org] 
Sent: Friday, October 07, 2016 10:56 PM
To: users@nifi.apache.org
Subject: Re: SelectHiveQL Error

Another approach might be to allow the Hive processors to use a 
DBCPConnectionPool instead of requiring a HiveConnectionPool (which is a 
subclass). That would either involve moving the extra method
(getConnectionUrl) to the DBCPService interface, or doing a check from the 
processors before calling getConnectionUrl() (which is used for provenance). 
With the former, HiveDBCPService would effectively just be a marker interface, 
the implementation HiveConnectionPool would remain as-is (to include a certain 
version of Hive, hardcode the driver name, etc. for ease of use). Then if you 
wanted to Bring Your Own Hive, you could set up a normal DBCPConnectionPool, 
add the directory containing the Hive JARs, set the driver name to 
org.apache.hive.jdbc.HiveDriver, and then use that in the HiveQL processors.

I'll give that a try shortly to see if it's a viable option (not sure if the 
Hive NAR would pollute the classloader for a DBCPConnectionPool instantiated 
from a HiveQL processor). Being able to add additional driver JARs was added in 
NiFi 1.0.0 [1], and was done to support this kind of thing. However it can't be 
used out of the box for Hive because the SQL processors make JDBC API calls 
that the Hive JDBC driver doesn't support, and the HiveQL processors require a 
HiveConnectionPool. If we can kind of merge the two concepts (using HiveQL 
processors with DBCPConnectionPool services), we might be in good shape.

Thoughts? Thanks,
Matt

[1] https://issues.apache.org/jira/browse/NIFI-2604

On Fri, Oct 7, 2016 at 10:28 PM, Andy LoPresto <alopre...@apache.org> wrote:
> I don’t have all the background on this issue, but it might be 
> something where the solution moving forward (until the Extension 
> Registry is
> introduced) is to follow a similar path as the Kafka connectors, i.e.
> separate processors tied to each (incompatible) version of the library.
> Thoughts?
>
> Andy LoPresto
> alopre...@apache.org
> alopresto.apa...@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Oct 7, 2016, at 6:26 PM, Nathamuni, Ramanujam <rnatham...@tiaa.org>
> wrote:
>
> Andrew,
>
> I agree but reality of enterprise to move one version to another is 
> hard and also we need to have provisions to support multiple versions  
> of drivers if not it will be a challenge. If we need to connect with 
> multiple Hadoop cluster they might be running any different versions 
> and asking all those Hadoop cluster owners to be on same version is going be 
> challenge.
>
> Just my opinion:-)
>
> Thanks,
> Ram
>
>
> 
> From: Andrew Grande
> Sent: Friday, October 07, 2016 6:14:07 PM
> To: users@nifi.apache.org
> Subject: Re: SelectHiveQL Error
>
> I remember this error, it basically means your Hive is too old. 
> There's no way to make a generic Hive client, a line has to be drawn 
> somewhere. Same, as e.g. a car running on premium gas won't work with regular.
>
> You need at least Hive 1.2.
>
> Andrew
>
>
> On Fri, Oct 7, 2016, 10:20 AM Nathamuni, Ramanujam 
> <rnatham...@tiaa.org>
> wrote:
>>
>> I do have similar client protocol issue? how can we make this  Hive* 
>> processor very generic where users can point to the LIB directory 
>> where it can have JAR files for Hadoop Cluster?
>>
>>
>>
>> SAS Hadoop Access connector is using below approach from their 
>> Enterprise Guide.
>>
>>
>>
>> -Download the JAR files from hadoop cluster
>>
>> -Down the config files from hadoop cluster
>>
>>
>>
>> Export two configuration variables
>>
>>
>>
>> Export HDOOOP_LIB_PATH=/opt/cdh/5.7.1/lib/ (which will have
>> all the jar files)
>>
>> Export HADOOP_CONFIG_PATH=/opt/cdh/5.7.1/conf/
>>
>>
>>
>> Can we have similar options on all the hadoop related processors? 
>> Which will make things to work with all different version of hadoop.
>>
>>
>>
>> Thanks,
>>
>> Ram
>>
>> From: Dan Giannone [mailto:dgiann...@humana.com]
>> Sent: Friday, October 07, 2016 9:49 AM
>>
>>
>> To: users@nifi.apache.org
>> Subject: RE: SelectHiveQL Error
>>
>>
>>
>> It turns out the port needed to be c

RE: SelectHiveQL Error

2016-10-07 Thread Nathamuni, Ramanujam
Andrew,

I agree but reality of enterprise to move one version to another is hard and 
also we need to have provisions to support multiple versions  of drivers if not 
it will be a challenge. If we need to connect with multiple Hadoop cluster they 
might be running any different versions and asking all those Hadoop cluster 
owners to be on same version is going be challenge.

Just my opinion:-)

Thanks,
Ram



From: Andrew Grande
Sent: Friday, October 07, 2016 6:14:07 PM
To: users@nifi.apache.org
Subject: Re: SelectHiveQL Error


I remember this error, it basically means your Hive is too old. There's no way 
to make a generic Hive client, a line has to be drawn somewhere. Same, as e.g. 
a car running on premium gas won't work with regular.

You need at least Hive 1.2.

Andrew

On Fri, Oct 7, 2016, 10:20 AM Nathamuni, Ramanujam 
<rnatham...@tiaa.org<mailto:rnatham...@tiaa.org>> wrote:

I do have similar client protocol issue? how can we make this  Hive* processor 
very generic where users can point to the LIB directory where it can have JAR 
files for Hadoop Cluster?



SAS Hadoop Access connector is using below approach from their Enterprise Guide.



-Download the JAR files from hadoop cluster

-Down the config files from hadoop cluster



Export two configuration variables



Export HDOOOP_LIB_PATH=/opt/cdh/5.7.1/lib/ (which will have all the 
jar files)

Export HADOOP_CONFIG_PATH=/opt/cdh/5.7.1/conf/



Can we have similar options on all the hadoop related processors? Which will 
make things to work with all different version of hadoop.



Thanks,

Ram

From: Dan Giannone [mailto:dgiann...@humana.com<mailto:dgiann...@humana.com>]
Sent: Friday, October 07, 2016 9:49 AM

To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: RE: SelectHiveQL Error



It turns out the port needed to be changed for hive server2 as well. That 
seemed to fix the below issue. However, now I get :



> org.apache.thrift.TApplicationException: Required field 'client_protocol' is 
> unset!



Which according to 
this<http://stackoverflow.com/questions/30931599/error-jdbc-hiveconnection-error-opening-session-hive>
 indicates my hive and hive-jdbc versions are mismatching. “Hive –-version” 
gives me 1.1.0. If I were to download the hive-jdbc 1.1.0 jar, is there a way I 
could specify that it us that?





-Dan



From: Dan Giannone [mailto:dgiann...@humana.com]
Sent: Friday, October 07, 2016 9:25 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: RE: SelectHiveQL Error



Hi Matt,



When I try to change to jdbc:hive2://, I get a different error set of errors.



>Error getting Hive connection

>org.apache.commons.dbcp.SQLNestedException: Cannot create 
>PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
>jdbc:hive2://…)

>Caused by: java.sql.SQLException: Could not open client transport with JDBC 
>Uri: jdbc:hive2://…

>Caused by: org.apache.thrift.transport.TTransportException: null



I am thinking you are right in that it is an issue with my connection URL. Is 
there some command I can run that will generate this for me? Or a specific 
place I should look? The only mention of a url in hive-site.xml that I see is:





hive.metastore.uris

thrift://server:port







-Dan



From: Matt Burgess [mailto:mattyb...@gmail.com]
Sent: Thursday, October 06, 2016 5:17 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: SelectHiveQL Error



Andrew is correct. Although the HiveServer

1 driver is included with the NAR, the HiveConnectionPool is hardcoded to use 
the HiveServer 2 driver (since the former doesn't allow for simultaneous 
connections and we are using a connection pool :) the scheme should be 
jdbc:hive2:// not hive.



If that was a typo and you are using the correct scheme, could you provide your 
configuration details/properties?



Thanks,

Matt



On Oct 6, 2016, at 4:07 PM, Andrew Grande 
<apere...@gmail.com<mailto:apere...@gmail.com>> wrote:

Are you sure the jdbc url is correct? Iirc, it was jdbc:hive2://

Andrew



On Thu, Oct 6, 2016, 3:46 PM Dan Giannone 
<dgiann...@humana.com<mailto:dgiann...@humana.com>> wrote:

Hi Matt,

Here is the whole error trace, starting from when I turned on the SelectHiveQL 
processor:

INFO [StandardProcessScheduler Thread-2] o.a.n.c.s.TimerDrivenSchedulingAgent 
Scheduled SelectHiveQL[id=0157102a-94da-11ec-0f7e-17fd3119aa00] to run with 1 
threads
2016-10-06 15:37:06,554 INFO [Timer-Driven Process Thread-7] 
o.a.nifi.dbcp.hive.HiveConnectionPool 
HiveConnectionPool[id=0157102d-94da-11ec-4d91-5a8952e888bd] Simple 
Authentication
2016-10-06 15:37:06,556 ERROR [Timer-Driven Process Thread-7] 
o.a.nifi.dbcp.hive.HiveConnectionPool 
HiveConnectionPool[id=0157102d-94da-11ec-4d91-5a8952e888bd] Error getting Hive 
connection
2016-10

RE: SelectHiveQL Error

2016-10-07 Thread Nathamuni, Ramanujam
I do have similar client protocol issue? how can we make this  Hive* processor 
very generic where users can point to the LIB directory where it can have JAR 
files for Hadoop Cluster?

SAS Hadoop Access connector is using below approach from their Enterprise Guide.


-Download the JAR files from hadoop cluster

-Down the config files from hadoop cluster

Export two configuration variables

Export HDOOOP_LIB_PATH=/opt/cdh/5.7.1/lib/ (which will have all the 
jar files)
Export HADOOP_CONFIG_PATH=/opt/cdh/5.7.1/conf/

Can we have similar options on all the hadoop related processors? Which will 
make things to work with all different version of hadoop.

Thanks,
Ram
From: Dan Giannone [mailto:dgiann...@humana.com]
Sent: Friday, October 07, 2016 9:49 AM
To: users@nifi.apache.org
Subject: RE: SelectHiveQL Error

It turns out the port needed to be changed for hive server2 as well. That 
seemed to fix the below issue. However, now I get :

> org.apache.thrift.TApplicationException: Required field 'client_protocol' is 
> unset!

Which according to 
this
 indicates my hive and hive-jdbc versions are mismatching. “Hive –-version” 
gives me 1.1.0. If I were to download the hive-jdbc 1.1.0 jar, is there a way I 
could specify that it us that?


-Dan

From: Dan Giannone [mailto:dgiann...@humana.com]
Sent: Friday, October 07, 2016 9:25 AM
To: users@nifi.apache.org
Subject: RE: SelectHiveQL Error

Hi Matt,

When I try to change to jdbc:hive2://, I get a different error set of errors.

>Error getting Hive connection
>org.apache.commons.dbcp.SQLNestedException: Cannot create 
>PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
>jdbc:hive2://…)
>Caused by: java.sql.SQLException: Could not open client transport with JDBC 
>Uri: jdbc:hive2://…
>Caused by: org.apache.thrift.transport.TTransportException: null

I am thinking you are right in that it is an issue with my connection URL. Is 
there some command I can run that will generate this for me? Or a specific 
place I should look? The only mention of a url in hive-site.xml that I see is:


hive.metastore.uris
thrift://server:port



-Dan

From: Matt Burgess [mailto:mattyb...@gmail.com]
Sent: Thursday, October 06, 2016 5:17 PM
To: users@nifi.apache.org
Subject: Re: SelectHiveQL Error

Andrew is correct. Although the HiveServer
1 driver is included with the NAR, the HiveConnectionPool is hardcoded to use 
the HiveServer 2 driver (since the former doesn't allow for simultaneous 
connections and we are using a connection pool :) the scheme should be 
jdbc:hive2:// not hive.

If that was a typo and you are using the correct scheme, could you provide your 
configuration details/properties?

Thanks,
Matt


On Oct 6, 2016, at 4:07 PM, Andrew Grande 
> wrote:

Are you sure the jdbc url is correct? Iirc, it was jdbc:hive2://

Andrew

On Thu, Oct 6, 2016, 3:46 PM Dan Giannone 
> wrote:
Hi Matt,

Here is the whole error trace, starting from when I turned on the SelectHiveQL 
processor:

INFO [StandardProcessScheduler Thread-2] o.a.n.c.s.TimerDrivenSchedulingAgent 
Scheduled SelectHiveQL[id=0157102a-94da-11ec-0f7e-17fd3119aa00] to run with 1 
threads
2016-10-06 15:37:06,554 INFO [Timer-Driven Process Thread-7] 
o.a.nifi.dbcp.hive.HiveConnectionPool 
HiveConnectionPool[id=0157102d-94da-11ec-4d91-5a8952e888bd] Simple 
Authentication
2016-10-06 15:37:06,556 ERROR [Timer-Driven Process Thread-7] 
o.a.nifi.dbcp.hive.HiveConnectionPool 
HiveConnectionPool[id=0157102d-94da-11ec-4d91-5a8952e888bd] Error getting Hive 
connection
2016-10-06 15:37:06,557 ERROR [Timer-Driven Process Thread-7] 
o.a.nifi.dbcp.hive.HiveConnectionPool
org.apache.commons.dbcp.SQLNestedException: Cannot create JDBC driver of class 
'org.apache.hive.jdbc.HiveDriver' for connect URL 
'jdbc:hive://server:port/default'
at 
org.apache.commons.dbcp.BasicDataSource.createConnectionFactory(BasicDataSource.java:1452)
 ~[commons-dbcp-1.4.jar:1.4]
at 
org.apache.commons.dbcp.BasicDataSource.createDataSource(BasicDataSource.java:1371)
 ~[commons-dbcp-1.4.jar:1.4]
at 
org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)
 ~[commons-dbcp-1.4.jar:1.4]
at 
org.apache.nifi.dbcp.hive.HiveConnectionPool.getConnection(HiveConnectionPool.java:269)
 ~[nifi-hive-processors-1.0.0.jar:1.0.0]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[na:1.8.0_45]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[na:1.8.0_45]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[na:1.8.0_45]
at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_45]
at 

Idea needed to get XL from SharePoint using NiFi

2016-09-19 Thread Nathamuni, Ramanujam
Hello:

I need to get the XL files on windows SharePoint 2013  and store as CSV file or 
load into database? Please give me the things to try out?

Thanks,
Ram
*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA
*


RE: beginner question on destination failure

2016-09-15 Thread Nathamuni, Ramanujam
Thanks Mark,

It worked .


Thanks,
Ram



From: Mark Payne
Sent: Thursday, September 15, 2016 3:51:45 PM
To: users@nifi.apache.org
Subject: Re: beginner question on destination failure

Ram,

You can simply create a connection from PutHBase back to itself and select the 
'failure' relationship. This will cause it to stay in the flow until
you are able to push to HBase again.

Thanks
-Mark


On Sep 15, 2016, at 3:36 PM, Nathamuni, Ramanujam 
<rnatham...@tiaa.org<mailto:rnatham...@tiaa.org>> wrote:

Good Evening:

I have flow and destination is write to hbase. I want to design flow in such a 
way that if Hadoop(hbase) goes down for maintenance the flow file is queued. 
How do I design the relation in such way that data get queued and waits for 
hbase became available?

Thanks,
Ram


*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA
*

*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA
*


beginner question on destination failure

2016-09-15 Thread Nathamuni, Ramanujam
Good Evening:

I have flow and destination is write to hbase. I want to design flow in such a 
way that if Hadoop(hbase) goes down for maintenance the flow file is queued. 
How do I design the relation in such way that data get queued and waits for 
hbase became available?

Thanks,
Ram

*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA
*


RE: What next with NiFi

2016-09-06 Thread Nathamuni, Ramanujam
Hi Joe and Team:

Enterprise needs very high data security and audit trial. What is the vision to 
have canvas/processor/processor group  ...etc. isolated for project/LOBs? 

Thanks,
Ram 

-Original Message-
From: Joe Witt [mailto:joe.w...@gmail.com] 
Sent: Tuesday, September 06, 2016 9:12 AM
To: users@nifi.apache.org
Subject: Re: What next with NiFi

Gunjan

The best indicator of areas of focus I think are listed here 
https://cwiki.apache.org/confluence/display/NIFI/NiFi+Feature+Proposals

In a roadmap discussion thread back in January of this year the items mentioned 
specifically as trailing the 1.0 release were extension and variable registries.

Which items will get addressed in which order is perhaps less precise and more 
a function of where contributions occur.

Thanks
Joe

On Tue, Sep 6, 2016 at 9:03 AM, Gunjan Dave  wrote:
> Hello NiFi team, now that version 1.0.0 is out in open, what are the 
> next big ticket plans?
> I saw 1.1.0 jiras but those are mostly bug fixes and minor 
> enhacements, but what are the larger plans from the existing roadmap?
>

*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA
*


RE: ExecuteScript Processor - Control Flow

2016-08-29 Thread Nathamuni, Ramanujam
I do have similar question – as I have the Execute script using python to run 
code and it produces the output file (/tmp/test.xml)  but not sure how to use 
that file to next processor without using additional flow file using   GetFile 
processor to get a file produced by python execute script.  I am very new to 
NiFi.

Following is need:


1.   READ CSV file from HDFS

2.   Execute python script – reads CSV file and produces XML  output file – 
example /tmp/test.xml .

3.  I need to process the /tmp/test.xml file using  SplitXML processor

4.  Put these into HDFS


Thanks,
Ram
From: James Wing [mailto:jvw...@gmail.com]
Sent: Monday, August 29, 2016 12:47 AM
To: users@nifi.apache.org
Subject: Re: ExecuteScript Processor - Control Flow

Koustav,
How are you running the Sqoop job?  Can you share some code?  Python is 
sequential by default, but your Sqoop job might run asynchronously.  I believe 
the answer depends on your code (or library) not only starting the Sqoop job, 
but polling for it's status until it is complete.
Thanks,
James

On Sun, Aug 28, 2016 at 8:24 PM, koustav choudhuri 
> wrote:
Hi All

I have a python script running on a Nifi Server , whin in turn calls a Sqoop 
job on a different Server . The next step in the script is to use the flow file 
from the previous processor to continue to the next processor .

So the python script is like :

1. call the sqoop job on server 2
2. get the flow file from the session and continue


Question  :
Will step 2 wait till Step1 completes ?
Or ,
As soon as the Sqoop job gets initiated through Step 1 , Step 2 Executes 
irrespective of whether Step 1 completes or not .

Could be a dumb question , still asking .



*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA
*


NiFi processor to convert CSV to XML

2016-08-25 Thread Nathamuni, Ramanujam
Hello All,

I am looking for processor to convert  CSV file to XML. I looked at the 
processors available but I do not see one for CSV to XML? Is there any  
workaround using other processor to this job? Or any can write new processor 
for this function?


Thanks,
Ram
*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA
*