Re: GetHDFS from Azure Blob

2017-03-28 Thread Austin Heyne
Thanks Bryan, We're only working with one account here but with multiple root level containers. e.g. wasb://c...@accountname.blob.core.windows.net/ wasb://x...@accountname.blob.core.windows.net/ wasb://j...@accountname.blob.core.windows.net/ The thing that stands out to me the most is why

Re: GetHDFS from Azure Blob

2017-03-28 Thread Bryan Bende
Austin, I think you are correct that its @, I hadn't looked at this config in a long time and was reading too quickly before :) That would line up with the other property fs.azure.account.key..blob.core.windows.net where you specify the key for that account. I have no idea if this will work,

Re: GetHDFS from Azure Blob

2017-03-28 Thread Austin Heyne
Bryan, So I initially didn't think much of it (assumed it a typo, etc) but you've said that the access url for wasb that you've been using is wasb://YOUR_USER@YOUR_HOST/. However, this has never worked for us and I'm wondering if we have a difference configuration somewhere. What we have to

Re: GetHDFS from Azure Blob

2017-03-28 Thread Bryan Bende
Austin, I believe the default FS is only used when you write to a path that doesn't specify the filesystem. Meaning, if you set the directory of PutHDFS to /data then it will use the default FS, but if you specify wasb://user@wasb2/data then it will go to /data in a different filesystem. The

Re: GetHDFS from Azure Blob

2017-03-28 Thread Austin Heyne
Hi Andre, Yes, I'm aware of that configuration property, it's what I have been using to set the core-site.xml and hdfs-site.xml. For testing this I didn't modify the core-site located in the HADOOP_CONF_DIR but rather copied and modified it and the pointed the processor to the copy. The

Re: GetHDFS from Azure Blob

2017-03-28 Thread Andre
Austin, Perhaps that wasn't explicit but the settings don't need to be system wide, instead the defaultFS may be changed just for a particular processor, while the others may use configurations. The *HDFS processor documentation mentions it allows yout to set particular hadoop configurations:

Re: Pulling API Endpoints into Kafka Topics in Avro

2017-03-28 Thread Steve Champagne
Ah, that worked great! I hadn't known about the Avro map type. Thanks!  On Tue, Mar 28, 2017 at 11:51 AM, James Wing wrote: > Steve, > > The inferred schemas can be helpful to get you started, but I recommend > providing your own Avro schema based on your knowledge of what

Re: GetHDFS from Azure Blob

2017-03-28 Thread Austin Heyne
Thanks Bryan, Working with the configuration you sent what I needed to change was to set the fs.defaultFS to the wasb url that we're working from. Unfortunately this is a less than ideal solution since we'll be pulling files from multiple wasb urls and ingesting them into an Accumulo

Re: GetHDFS from Azure Blob

2017-03-28 Thread Bryan Bende
Austin, Can you provide the full error message and stacktrace for the IllegalArgumentException from nifi-app.log? When you start the processor it creates a FileSystem instance based on the config files provided to the processor, which in turn causes all of the corresponding classes to load.

GetHDFS from Azure Blob

2017-03-28 Thread Austin Heyne
Hi all, Thanks for all the help you've given me so far. Today I'm trying to pull files from an Azure blob store. I've done some reading on this and from previous tickets [1] and guides [2] it seems the recommended approach is to place the required jars, to use the HDFS Azure protocol, in

Re: Pulling API Endpoints into Kafka Topics in Avro

2017-03-28 Thread James Wing
Steve, The inferred schemas can be helpful to get you started, but I recommend providing your own Avro schema based on your knowledge of what should be guaranteed to downstream systems. If you want to pass untyped data, you can't really beat JSON. Avro schema isn't so bad, honest. As part of

Re: ExecuteScript once at workflow inception

2017-03-28 Thread James McMahon
Thank you Matt. I am not sure I fully understand how to do this in Python yet, but am going to try and look closely at your example and see if I can get something working. -Jim On Tue, Mar 28, 2017 at 11:00 AM, Matt Burgess wrote: > Jim, > > You can use

Re: Connection cannot be established

2017-03-28 Thread Aldrin Piri
Hi Otmane! Dropping dev to BCC. We do not currently have support for this in the processor as currently released and is an outstanding JIRA issue [1]. On that issue, a community member has an implementation that may work for you but also seems dependent on the core library of the processor

Re: ExecuteScript once at workflow inception

2017-03-28 Thread Bryan Rosander
That would be the general idea, you'd probably need create a Controller Service interface and implementation [1] that would take the result and write it out to a file. The filename could be part of the method signature. Another alternative would be to use NiFi's logging framework and configure

Re: ExecuteScript once at workflow inception

2017-03-28 Thread James McMahon
Thank you Bryan. So would the Controller Service serve as an interface through which I direct log messages to a log file that I stipulate? Similar to how we can set up different SSL Context Services that relate to different cert authorities? If so, then that would help. Let me describe my

Re: ExecuteScript once at workflow inception

2017-03-28 Thread Matt Burgess
Jim, You can use InvokeScriptedProcessor [1] rather than ExecuteScript for this. ExecuteScript basically lets you provide an onTrigger() body, which is called when the ExecuteScript processor "has work to do". None of the other lifecycle methods are available. For InvokeScriptedProcessor, you

Re: ExecuteScript once at workflow inception

2017-03-28 Thread Bryan Rosander
Hey James, I wonder if you'd be better suited with a Controller Service that could provide access to configured loggers, etc. It looks like ExecuteScript can lookup Controller Services [1]. A script-based Controller Service implementation (so you could use python or another scripting language

ExecuteScript once at workflow inception

2017-03-28 Thread James McMahon
Hello. I am interested in calling a python script from ExecuteScript that sets up Python loggers and establishes file handles to those loggers for use by other python scripts called later in the workflow by other ExecuteScript processors. Is there a means to execute a script at workflow inception

Pulling API Endpoints into Kafka Topics in Avro

2017-03-28 Thread Steve Champagne
I'm in the process of creating an ingest workflow that will pull into Kafka topics a number of API endpoints on an hourly basis. I'd like convert them from JSON to AVRO when I bring them in. I have, however, run into a few problems that I haven't been able to figure out and haven't turned anything

Re: Cannot Restart Nifi

2017-03-28 Thread Joe Witt
I do agree. Glad you're back on track. On Tue, Mar 28, 2017 at 9:44 AM, James McMahon wrote: > *yikes*! Message received. We have now done all but one of them. That one > being Set How long Sockets Stay in a TIMED_WAIT State When Closed. According > to our system

Re: Cannot Restart Nifi

2017-03-28 Thread James McMahon
*yikes*! Message received. We have now done all but one of them. That one being Set How long Sockets Stay in a TIMED_WAIT State When Closed. According to our system administrator we are unable to do this because - and I'm paraphrasing here - we do not have all the necessary components or libraries

Re: Cannot Restart Nifi

2017-03-28 Thread Joe Witt
jim - definitely take the time to walk through the best practices guide. Some are more like "if you dont do this it will probably kill the process - practices". On Tue, Mar 28, 2017 at 9:27 AM, James McMahon wrote: > I have been able to bring Nifi UI back up with this

Re: Cannot Restart Nifi

2017-03-28 Thread James McMahon
I have been able to bring Nifi UI back up with this change to the limit on number of open files. Thank you all very much for your help and insights. -Jim On Tue, Mar 28, 2017 at 8:51 AM, James McMahon wrote: > Thank you Aldrin. I do have AutoResumeState set to false

Re: Cannot Restart Nifi

2017-03-28 Thread Aldrin Piri
Jim, In terms of trying to ease NiFi at start up, you could also try setting nifi.flowcontroller.autoResumeState to false in your nifi.properties. Depending on how your flow and scripts are constructed, this may allow you to piecewise alleviate any large queues/processing of files that could be

Re: Access Denied after initial setup

2017-03-28 Thread Joe Witt
thanks for following up with the resolution! On Tue, Mar 28, 2017 at 8:43 AM, Bram vd Klinkenberg wrote: > Solved it! > > > I used CN=admin with the toolkit and cn=admin in the users.xml Changed > this to CN=admin and works fine now :). > > > > >

Re: Access Denied after initial setup

2017-03-28 Thread Bram vd Klinkenberg
Solved it! I used CN=admin with the toolkit and cn=admin in the users.xml Changed this to CN=admin and works fine now :). Van: Bram vd Klinkenberg Verzonden: dinsdag 28 maart 2017 12:59 Aan: users@nifi.apache.org Onderwerp: FW: Access Denied after initial

Re: Cannot Restart Nifi

2017-03-28 Thread Aldrin Piri
Hi Jim, Apologies for terse response earlier, was typing from phone. I am assuming you are on a Linux system. First and foremost, do checkout the Sys Admin guide [1]. In particular, scope out the best practices [2] for configuration which will have you increase your open file handles. I do

Re: Cannot Restart Nifi

2017-03-28 Thread James McMahon
Hi Aldrin. Yes sir, of course: my environment is NiFi v0.7. I have my content, flowfile, and provenance repositories on separate independent disk devices. In my nifi.properties file, nifi.flowfile.repository.partitions equals 256, and always.sync is false. My nifi.queue.swap.threshold is 2.

FW: Access Denied after initial setup

2017-03-28 Thread Bram vd Klinkenberg
Hi, I have some issues after initial setup and securing NiFi. I have setup a CentOS6 (including java) machine with hostname nifi.domeinbram.nl. I downloaded NiFi and the tls toolkit and extracted them to /opt. I ran nifi.sh install and startedt the nifi service. After the initial setup of

Re: Cannot Restart Nifi

2017-03-28 Thread Aldrin Piri
Hi Jim, In getting to the root cause, could you please provide information on your environment? Did you apply the best practices listed in the System Administrator's guide? Could you provide some details on what your scripts are doing? If the data is not of importance, removing the Flowfile

Re: Cannot Restart Nifi

2017-03-28 Thread James McMahon
I noticed, too, that I have many partitions, partition-0 to partition-255 to be exact. These all have journal files in them. So I suspect that the journal file I cited is not specifically the problem in and of itself, but instead is the point where the allowable open files threshold is reached.

Cannot Restart Nifi

2017-03-28 Thread James McMahon
While trying to use Python logging from two scripts I call via two independent ExecuteScript processors, I seem to have inadvertently created a condition where I have too many files open. This is causing a serious challenge for me, because when I attempt to start nifi (v0.7.1) it fails. The log