from:"Jerry Vinokurov"

Re: ListS3 on very large buckets

2020-02-06 Thread Jerry Vinokurov

I'd say the thing you're going to be most concerned with is running out of
memory, since it's going to produce one flow file per item in your listing.
Is there any sort of sensible prefix structure that partitions your files?
If there is, I would have some sort of iterative logic that constructs the
prefix path, lists the files under that path, and then perhaps combines
them into a single file whose body is the list of the files themselves, or
alternately processes them in batches.

On Thu, Feb 6, 2020 at 12:41 PM Mike Thomsen  wrote:

> We might be having to pull down potentially tens of millions or more
> objects from a bucket soon in one big pull. Are there any best practices
> for handling this with ListS3? Last time I used it, I recall that on the
> initial pull it would just keep going even if you scheduled a stop request
> on it, but that might have been just a bad perception on my part.
>
> Thanks,
>
> Mike
>

-- 
http://www.google.com/profiles/grapesmoker

Re: Validate CSV/Records by name instead of position

2019-09-18 Thread Jerry Vinokurov

This certainly works. You can create a schema registry and define an Avro
schema listing your fields. Then make sure that when you set up the reader,
it's configured to read the header so that it knows which fields go where
in the record, set up the mode of the schema access to read from the
registry you created, and then set the name of the actual schema, which is
a property on the schema registry. This will correctly validate your CSV
for you.

On Wed, Sep 18, 2019 at 9:59 AM Eric Chaves  wrote:

> Hi folks,
>
> Is it possible to validate fields/columns in Record or CSV by its name
> instead of it's position? For example I have a record with two mandatory
> fields and some optional fields but they may be on different position on
> each ingested file. Should I use a script or there is already a processor
> that could help me out with those?
>
> Regards,
>

-- 
http://www.google.com/profiles/grapesmoker

Re: feature suggestion

2019-06-18 Thread Jerry Vinokurov

Hi Wyllys,

One way that I solve this problem in my work is to use the AttributesToJSON
processor to form the body of the POST before sending it. Granted, that
does override the body of the flowfile, so I'm not sure if that works for
your specific case, but it does allow you to select which attributes you
want to turn into JSON key/value pairs. For more complex formations I would
suggest checking out the JOLT Transform processor, which can be quite
powerful, if somewhat painful to work with.

Jerry

On Tue, Jun 18, 2019 at 9:49 PM Wyllys Ingersoll <
wyllys.ingers...@keepertech.com> wrote:

> Andy -
>Yes, Im referring to the PUT and POST methods, in which case the
> processor just sends the entire flowfile as a JSON object in the message
> body.  I'd prefer to either have the option to exclude some of the flow
> attributes or (even better) have the ability to craft my own message body.
> There are lots of instances where the receiver of the PUT or POST expects a
> particular structure that doesn't easily work with just a flat JSON-ified
> set of flow attributes.
>
> One example:
>   We have a flow that has an authentication token as one of the attributes
> and a bunch of other key/value pairs used for other purposes.  In the
> InvokeHTTP processor, I use the auth token attribute in an Authorization
> header by creating a dynamic attribute with the correct format
> ("Authorization: Bearer ${token}").  However, the recipient of the PUT/POST
> also expects the body of the request to be formatted a specific way and
> does not expect or want to see the auth token or some of the other
> unrelated information that ends up getting transmitted as part of the
> message body simply because they are flow attributes.  So, if InvokeHTTP
> were able to exclude certain fields from the message body AND also allow
> the body of the message to be configurable into a structure other than just
> a flat dictionary of flow attributes, it would be much more powerful and
> useful.  As it stands, I'm thinking I may have to develop a custom
> processor to get past this issue, which is not ideal at all.
>
> Thanks!
>   Wyllys Ingersoll
>
>
>
>
> On Tue, Jun 18, 2019 at 8:34 PM Andy LoPresto 
> wrote:
>
>> Hi Wyllys,
>>
>> Sorry to hear you are having trouble with this processor. Can you please
>> provide a more detailed example of an incoming flowfile and what your
>> expected output is compared to what is currently happening? Based on my
>> understanding, the flowfile attributes are only sent as request headers,
>> and that can be controlled using the regular expression value of
>> “Attributes to Send”. I believe only the flowfile content is sent as the
>> request body when using PUT or POST. Thanks.
>>
>>
>> Andy LoPresto
>> alopre...@apache.org
>> *alopresto.apa...@gmail.com *
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>
>> On Jun 18, 2019, at 3:01 PM, Wyllys Ingersoll <
>> wyllys.ingers...@keepertech.com> wrote:
>>
>>
>> it would be nice to be able to exclude attributes from the message body
>> when doing a PUT or POST in the invokeHTTP processor.  Currently, there's
>> no way to selectively choose which attributes go into the message body
>> without using a replaceText processor in front of it, but that completely
>> removes those attributes from being used later which is not always the
>> desirable.
>>
>> There are lots of situations where it would be nice to be able to use
>> some flow attributes just in the request header or for other parts of the
>> processor configuration without including them in the message body itself.
>> For example, taking an authentication token that is carried along as a flow
>> attribute so it can be used in an authorization header and NOT included in
>> the body of the request.
>>
>> In general, invokeHTTP needs to allow for more control over the format
>> and content of the message body being sent.
>>
>> -Wyllys Ingersoll
>>
>>
>>

-- 
http://www.google.com/profiles/grapesmoker

Re: Implementing Gates with the Wait and Notify Processors

2019-04-24 Thread Jerry Vinokurov

My understanding of how Wait/Notify worked was that the Wait processor
would look for the count in the target signal to reach a specific value, at
which point it would open and let through any flowfiles that were waiting.
I'm not sure if the target value that it's looking for is actually a delta
between whatever the cached value is and some internally-stored state that
Wait maintains; it would make sense to me if that were the case because it
would eliminate the need to keep resetting the counter. But I'm fairly
certain that this will not work the way you describe it, where the
running_count property has to be set to something on the flowfile in order
for the flowfile to go through.

On Tue, Apr 23, 2019 at 10:39 PM Shawn Weeks 
wrote:

> Running into some additional inconsistencies. I’m under the impression
> that only the Notify Processor can increment the counter. In example I’m
> testing I have Notify that sets the signal count “running_count” to zero
> and then immediately after it I have a Notify that increments by 2 yet when
> I run the groovy cache dumps script on
> http://funnifi.blogspot.com/2016/04/inspecting-your-nifi.html I can see
> “running_count” may be set to something much higher like 50 or 60. Another
> seems to be that the Wait Processor lets things through that don’t match
> the Target Signal Count. For example if the target signal count is 1 then
> it’s letting things through that have a target signal count of 2? I’ve got
> to be missing something rather obvious.
>
>
>
> Thanks
>
> Shawn
>
>
>
> *From:* Shawn Weeks 
> *Sent:* Tuesday, April 23, 2019 8:32 PM
> *To:* users@nifi.apache.org
> *Subject:* Implementing Gates with the Wait and Notify Processors
>
>
>
> I’m working to implement a flow where for a given source of data I can
> only be processing one set at a time due to external dependencies. Each set
> needs to go through several different steps so this isn’t just a matter of
> limiting concurrency for a single processor. I’m trying to implement this
> using Wait and Notify as a gate and I’ve ran into a couple of limitations
> that I’m not sure how to get around. I first set my Wait processors to wait
> for a specific counter to be reset to zero before allowing a data source
> through but I quickly discovered that the Wait Processors tries to divide
> the signal counter leading to a divide by zero error. I’m assuming if zero
> isn’t a valid value for the signal counter we should disallow it however
> since you can’t use the Notify processor to set arbitrary values other than
> zero I’m not sure how your supposed to make a 0 or 1 gate. Are you supposed
> to have two Notify Processors back to back where one resets the counter to
> zero and the next increments by one? That seems a bit clunky.
>
>
>
> Thoughts?
>
>
>
> Thanks
>
> Shawn Weeks
>


-- 
http://www.google.com/profiles/grapesmoker

Re: Implementing Gates with the Wait and Notify Processors

2019-04-23 Thread Jerry Vinokurov

Hi Shawn,

I think it's a hard question to answer in the general sense. A lot depends
on your specific implementation. What are you using to originate your flow?
Is it a GenerateFlowFile processor, a List, or something else? There are a
couple of different strategies that I've discovered for achieving gating
with Wait/Notify pairs; one of them is having a single flowfile where each
line is some item that you need to process, which is then split line by
line. When you do that, you end up with the fragment.identifier and
fragment.count properties on all the resulting split flowfiles and then you
can feed those into the Wait/Notify logic. Another trick has been that if
you know a single flowfile is going to be somehow transformed into multiple
ones, you can tack on an identifier like a UUID or some other sort of thing
and use that as the identifier in Wait/Notify. I have also built out nested
loops with two pairs as you described, but in general that tends to be
quite cumbersome.

Jerry

On Tue, Apr 23, 2019 at 9:32 PM Shawn Weeks 
wrote:

> I’m working to implement a flow where for a given source of data I can
> only be processing one set at a time due to external dependencies. Each set
> needs to go through several different steps so this isn’t just a matter of
> limiting concurrency for a single processor. I’m trying to implement this
> using Wait and Notify as a gate and I’ve ran into a couple of limitations
> that I’m not sure how to get around. I first set my Wait processors to wait
> for a specific counter to be reset to zero before allowing a data source
> through but I quickly discovered that the Wait Processors tries to divide
> the signal counter leading to a divide by zero error. I’m assuming if zero
> isn’t a valid value for the signal counter we should disallow it however
> since you can’t use the Notify processor to set arbitrary values other than
> zero I’m not sure how your supposed to make a 0 or 1 gate. Are you supposed
> to have two Notify Processors back to back where one resets the counter to
> zero and the next increments by one? That seems a bit clunky.
>
>
>
> Thoughts?
>
>
>
> Thanks
>
> Shawn Weeks
>

-- 
http://www.google.com/profiles/grapesmoker

Re: threads not terminating correctly

2019-04-09 Thread Jerry Vinokurov

Bryan and Joe, thanks for your tips. I will capture some dumps next time
this happens and see if I can narrow down what the problem is.

On Tue, Apr 9, 2019 at 10:48 AM Joe Witt  wrote:

> Hello
>
> You could see hung threads like this because the processor is simply
> taking a long time to do its task (possible in certain listing cases but
> probably not ListS3), or because it is truly stuck such as a live-lock or
> timeout condition it has hit.  These are almost always bugs and avoidable.
>
> When you believe you have a stuck thread it is best to take a series of
> thread dumps at say 10-20 intervals for roughly three of them.  If there is
> a stuck thread it will be pretty clear to see as you'll see the same stack
> trace and same thread name in each thread dump.  The name of the processor
> you'd search on in the thread dump is what matches the processor by class
> name that you think is stuck.
>
> Also, it was discovered that in conditions such as certain JVM Error based
> exceptions at runtime we could lose threads.  This is different than what
> you describe but it is worth grabbing NiFi 1.9.2 which I'll send an
> announce thread for shortly.
>
> Lastly, be sure you have given your flow controller enough threads.  While
> the number of concurrent tasks showing on a processing being stuck isn't
> related to what I'm mentioning directly ensuring you have sufficient
> threads available in the flow overall is important as well.
>
> Thanks
>
> On Tue, Apr 9, 2019 at 10:18 AM Jerry Vinokurov 
> wrote:
>
>> Hi all,
>>
>> I have observed some odd behavior in our NiFi system and I was hoping
>> that someone might have insight into what was causing this. We are running
>> NiFi 1.8.0 on a single EC2 m5.2x-large instance and we have a lot of flows
>> that run on it, I would say somewhere around 25. The flows do not all run
>> at the same time but are scheduled to fire at various times throughout the
>> day, so only about a quarter or so of them are ever actually running at
>> once.
>>
>> The behavior I have observed is that certain processors will indicate
>> that they are running a thread but that thread will never properly
>> terminate. An example would be something like ListS3, where one would
>> expect the processor to fire, check for new files, and either produce
>> flowfiles from the new data or do nothing. Instead what happens is that the
>> thread icon on the processor will indicate that it's doing something, but
>> will never disappear, and the processor will essentially "lock up" and
>> become inoperable. If we terminate the thread manually and just restart the
>> processor, it will repeat this behavior and the only thing that solves the
>> problem is a NiFi restart, which obviously we would like to avoid.
>>
>> I don't have a systematic way of generating this behavior but I've seen
>> it happen on ListS3, FetchS3Object, and ListenHTTP processors, at least. It
>> may also happen on others but I'm not sure and I don't know how to catch
>> it. My questions are:
>>
>> 1. What could be causing this behavior? Would this be a memory issue or
>> too many files opened? We've encountered those kinds of issues before but
>> the symptoms I'm used to seeing with those issues are not present here.
>> 2. Is there any way to catch when this is happening? I wouldn't expect
>> any processor that isn't doing some kind of heavy lifting (e.g. fetching or
>> processing large amounts of data) to be active for more than a few seconds
>> at most, but I'm not sure how to track this sort of thing, or whether it's
>> possible at all.
>> 3. How can we avoid it, short of restarting our NiFi regularly?
>>
>> Thanks in advance for any information.
>>
>> --
>> http://www.google.com/profiles/grapesmoker
>>
>

-- 
http://www.google.com/profiles/grapesmoker

threads not terminating correctly

2019-04-09 Thread Jerry Vinokurov

Hi all,

I have observed some odd behavior in our NiFi system and I was hoping that
someone might have insight into what was causing this. We are running NiFi
1.8.0 on a single EC2 m5.2x-large instance and we have a lot of flows that
run on it, I would say somewhere around 25. The flows do not all run at the
same time but are scheduled to fire at various times throughout the day, so
only about a quarter or so of them are ever actually running at once.

The behavior I have observed is that certain processors will indicate that
they are running a thread but that thread will never properly terminate. An
example would be something like ListS3, where one would expect the
processor to fire, check for new files, and either produce flowfiles from
the new data or do nothing. Instead what happens is that the thread icon on
the processor will indicate that it's doing something, but will never
disappear, and the processor will essentially "lock up" and become
inoperable. If we terminate the thread manually and just restart the
processor, it will repeat this behavior and the only thing that solves the
problem is a NiFi restart, which obviously we would like to avoid.

I don't have a systematic way of generating this behavior but I've seen it
happen on ListS3, FetchS3Object, and ListenHTTP processors, at least. It
may also happen on others but I'm not sure and I don't know how to catch
it. My questions are:

1. What could be causing this behavior? Would this be a memory issue or too
many files opened? We've encountered those kinds of issues before but the
symptoms I'm used to seeing with those issues are not present here.
2. Is there any way to catch when this is happening? I wouldn't expect any
processor that isn't doing some kind of heavy lifting (e.g. fetching or
processing large amounts of data) to be active for more than a few seconds
at most, but I'm not sure how to track this sort of thing, or whether it's
possible at all.
3. How can we avoid it, short of restarting our NiFi regularly?

Thanks in advance for any information.

-- 
http://www.google.com/profiles/grapesmoker

Re: merging flowfiles?

2019-02-20 Thread Jerry Vinokurov

As pictured, your setup will not work because MergeContent will not bin the
two connections together. What you'll want to do is to route both
connections through a funnel, which will turn your two connections into
one. Then route the output of the funnel to MergeContent.

On Wed, Feb 20, 2019 at 10:43 AM l vic  wrote:

> Hi,
> I have "original" flowfile in my flow that i have to merge with result of
> REST call ( see attached)
> I am looking at the "MergeContent" processor to do it but it seems from
> documentation that using it for merging different connections is not
> recommended:
>  "It is recommended that the Processor be configured with only a single
> incoming connection, as Group of FlowFiles will not be created from
> FlowFiles in different connections. This processor updates the mime.type att
> ribute as appropriate".
> I am not sure i understand how "merging"  work in MergeContent... What
> would be the strategy to merge result of my operation with original
> flowfile?
> Thank you,
>


-- 
http://www.google.com/profiles/grapesmoker

Re: Modify Flowfile attributes

2019-01-29 Thread Jerry Vinokurov

I wanted to add, since I've done this specific operation many times, that
you can really just do this via the NiFi expression language, which I think
is more "idiomatic" than having ExecuteScript processors all over the
place. Basically, you would have an UpdateAttribute that set something
called, say, date_extracted with an expression that looks something like
${filename:substringAfterLast('_'):toDate('.MM.dd')} (this is an
approximation based on the above, modify as necessary for your purpose).
Then you could use a second UpdateAttribute to extract various information
from this date with the format command, e.g. ${date_extracted:format('')}. I don't think there's one for "week" but in
general this is the approach I take when I need to do date munging.

On Tue, Jan 29, 2019 at 10:06 AM Tomislav Novosel 
wrote:

> Hi Matt, thanks for suggestions. But performance is not crucial here.
> This is code i tried. but I get error: "AttributeError: 'NoneType' object
> has no attribute 'getAttribute' at line number 4"
> If I remove code from line 6 to line 14, it works with some default
> attribute values for year_extracted and week_extracted, otherwise i get
> error form above.
>
> Tom
>
> from datetime import datetime, timedelta, date
>
> flowFile = session.get()
> file_name = flowFile.getAttribute('filename')
>
> date_file = file_name.split("_")[6]
> date_final = date_file.split(".")[0]
> date_obj = datetime.strptime(date_final,'%y%m%d')
> date_year = date_obj.year
> date_day = date_obj.day
> date_month = date_obj.month
>
> week = date(year=date_year, month=date_month, day=date_day).isocalendar()[
> 1]
> year = date(year=date_year, month=date_month, day=date_day).isocalendar()[
> 0]
>
> if (flowFile != None):
> flowFile = session.putAttribute(flowFile, "year_extracted", year)
> flowFile = session.putAttribute(flowFile, "week_extracted", week)
> session.transfer(flowFile, REL_SUCCESS)
> session.commit()
>
> On Tue, 29 Jan 2019 at 15:53, Matt Burgess  wrote:
>
>> Tom,
>>
>> Keep in mind that you are using Jython not Python, which I mention
>> only to point out that it is *much* slower than the native Java
>> processors such as UpdateAttribute, and slower than other scripting
>> engines such as Groovy or Javascript/Nashorn.
>>
>> If performance/throughput is not a concern and you're more comfortable
>> with Jython, then Jerry's suggestion of session.putAttribute(flowFile,
>> attributeName, attributeValue) should do the trick. Note that if you
>> are adding more than a couple attributes, it's probably better to
>> create a dictionary (eventually/actually, a Java Map)
>> of attribute name/value pairs, and use putAllAttributes(flowFile,
>> attributes) instead, as it is more performant.
>>
>> Regards,
>> Matt
>>
>> On Tue, Jan 29, 2019 at 9:25 AM Tomislav Novosel 
>> wrote:
>> >
>> > Thanks for the answer.
>> >
>> > Yes I know I can handle that with Expression language and
>> UpdateAttribute processor, but this is specific case on my work and I think
>> Python
>> > is better and more simple solution. I need to calc that with python
>> script.
>> >
>> > Tom
>> >
>> > On Tue, 29 Jan 2019 at 15:18, John McGinn 
>> wrote:
>> >>
>> >> Since you're script shows that "filename" is an attribute of your
>> flowfile, you could use the UpdateAttribute processor.
>> >>
>> >> If you right click on UpdateAttribute and choose ShowUsage, then
>> choose Expression Language Guide, it shows you the things you can handle.
>> >>
>> >> Something along the lines of ${filename:getDelimitedField(6,'_')}, if
>> I understand the Groovy code correctly. I did a GenerateFlowFIle to an
>> UpdateAttribute processor setting filename to "1_2_3_4_5_6.2_abc", then
>> sent that to another UpdateAttribute with the getDelimitedField() I listed
>> and I received 6.2. Then another UpdateAttribute could parse the 6.2 for
>> the second substring, or you might be able to chain them in the existing
>> UpdateProcessor.
>> >>
>> >>
>> >> 
>> >> On Tue, 1/29/19, Tomislav Novosel  wrote:
>> >>
>> >>  Subject: Modify Flowfile attributes
>> >>  To: users@nifi.apache.org
>> >>  Date: Tuesday, January 29, 2019, 9:04 AM
>> >>
>> >>  Hi all,
>> >>  I'm trying to calculate week number and date
>> >>  from filename using ExecuteScript processor and Jython. Here
>> >>  is python script.How can I add calculated
>> >>  attributes week and year to flowfile?
>> >>  Please help, thank you.Tom
>> >>  P.S. Maybe I completely missed with this script.
>> >>  Feel free to correct me.
>> >>
>> >>  import
>> >>  jsonimport java.iofrom org.apache.commons.io import
>> >>  IOUtilsfrom java.nio.charset import
>> >>  StandardCharsetsfrom org.apache.nifi.processor.io import
>> >>  StreamCallbackfrom datetime import datetime, timedelta, date
>> >>  class PyStreamCallback(StreamCallback):
>> >>  def __init__(self, flowfile):
>> >>  self.ff = flowfile
>> >> pass
>> >>  def process(self, inputStream, outputStream):
>> >>  file_name =
>> >>

Re: Modify Flowfile attributes

2019-01-29 Thread Jerry Vinokurov

You can do this using the putAttribute function of the flowfile. You're
using getAttribute to get the filename and you can use putAttribute to set
other attributes of the flowfile before transferring it.

On Tue, Jan 29, 2019 at 9:04 AM Tomislav Novosel 
wrote:

> Hi all,
>
> I'm trying to calculate week number and date from filename using
> ExecuteScript processor and Jython. Here is python script.
> How can I add calculated attributes week and year to flowfile?
>
> Please help, thank you.
> Tom
>
> P.S. Maybe I completely missed with this script. Feel free to correct me.
>
>
> import json
> import java.io
> from org.apache.commons.io import IOUtils
> from java.nio.charset import StandardCharsets
> from org.apache.nifi.processor.io import StreamCallback
> from datetime import datetime, timedelta, date
>
> class PyStreamCallback(StreamCallback):
> def __init__(self, flowfile):
> self.ff = flowfile
> pass
> def process(self, inputStream, outputStream):
> file_name = self.ff.getAttribute("filename")
> date_file = file_name.split("_")[6]
> date_final = date_file.split(".")[0]
> date_obj = datetime.strptime(date_final,'%y%m%d')
> date_year = date_obj.year
> date_day = date_obj.day
> date_month = date_obj.month
>
> week = date(year=date_year, month=date_month, day=date_day).isocalendar()[
> 1]
> year = date(year=date_year, month=date_month, day=date_day).isocalendar()[
> 0]
>
> flowFile = session.get()
> if (flowFile != None):
> session.transfer(flowFile, REL_SUCCESS)
> session.commit()
>


-- 
http://www.google.com/profiles/grapesmoker

Re: NiFI as Data Pipeline Orchestration Tool?

2019-01-22 Thread Jerry Vinokurov

Hi all,

In our application, we faced the same problem. To solve it, we wrote a
Django app that sat at the center of the interaction between NiFi and
several other systems (including Spark and another internal application)
and used it to dispatch tasks as needed. In that architecture, NiFi was not
itself the orchestrator, but rather interacted with another application
that acted that way. We found that this was a good solution to our problems
that properly divided responsibilities between what NiFi was good at doing
(moving files from place to place) and what was better done in Python code
(many of the tasks described above). If you don't want to go so far as to
write your own orchestrator, you might want to checkout crossbar.io, which
could serve the function of communicating between different services.

Jerry

On Tue, Jan 22, 2019 at 11:05 AM Otto Fowler 
wrote:

> How would nifi look or have to look to support batch cases I wonder
>
>
> On January 22, 2019 at 10:24:10, Boris Tyukin (bo...@boristyukin.com)
> wrote:
>
> We've looked at both...Airflow might be a way better tool for
> coordination/scheduling. Why do not you take one of your pipelines and try
> to implement it in both tools?
>
> We really liked Airflow but unfortunately, Airflow was not a good fit for
> real-time processes - that's why we decided to go with NiFi. But if you use
> it strictly for job coordination and typical ETL-like dependencies, you
> will have hard time. Things, which are easy and obvious with Airflow or ETL
> tools like Informatica or SSIS, are quite difficult with NiFi. Just check
> some examples on Wait/Notify or merge patterns and you will see why.
>
> IMHO since NiFi was designed from the ground up to support real-time use
> cases not batch cases, the design and approach are quite different from
> batch oriented tools like Airflow.
>
> Boris
>
> On Fri, Jan 11, 2019 at 12:02 PM Jonathan Meran 
> wrote:
>
>> Hello,
>>
>> I am looking into the possibility of using NiFi as a Data Pipeline
>> Orchestration Tool. I’m evaluating NiFi along with some other tools such as
>> Airflow and AWS Step Functions/Lambdas.
>>
>>
>>
>> Has anyone used NiFi as an orchestration/scheduling tool for tasks such
>> as submitting spark jobs to an EMR cluster? These are some of the
>> requirements we are considering while evaluating such a tool:
>>
>>
>>
>>1. SSH capabilities to execute remote commands
>>2. Rich scheduling (CRON)
>>3. Ability to write custom routines and import custom libraries
>>4. Event-based triggering of a pipeline
>>
>>
>>
>> Any insight would be helpful. We have used NiFi for about a year now for
>> data movement and are familiar with its capabilities. My biggest worry is
>> the ability to coordinate with other machines using SSH.
>>
>>
>>
>> Thanks,
>>
>> Jon
>>
>

-- 
http://www.google.com/profiles/grapesmoker

Re: ListS3 on very large buckets

Re: Validate CSV/Records by name instead of position

Re: feature suggestion

Re: Implementing Gates with the Wait and Notify Processors

Re: Implementing Gates with the Wait and Notify Processors

Re: threads not terminating correctly

threads not terminating correctly

Re: merging flowfiles?

Re: Modify Flowfile attributes

Re: Modify Flowfile attributes

Re: NiFI as Data Pipeline Orchestration Tool?

11 matches

Site Navigation

Mail list logo

Footer information