Just realise that GetHTTP doesn't accept input:

@Tags({"get", "fetch", "poll", "http", "https", "ingest", "source",
"input"})
*@InputRequirement(Requirement.INPUT_FORBIDDEN)*
@CapabilityDescription("Fetches a file via HTTP")
@WritesAttributes({
    @WritesAttribute(attribute = "filename", description = "The filename is
set to the name of the file on the remote server"),
    @WritesAttribute(attribute = "mime.type", description = "The MIME Type
of the FlowFile, as reported by the HTTP Content-Type header")
})
public class GetHTTP extends AbstractSessionFactoryProcessor

Given that, GetHTTP is not suitable in this use case. I can only use
InvokeHTTP alone or combination of ControlRate and InvokeHTTP.



On Sat, Feb 13, 2016 at 11:38 AM, Jeff - Data Bean Australia <
[email protected]> wrote:

> Thank you both, Joe and Simon, for pointing out InvokeHTTP and ControlRate
> to me.
>
> Regarding InvokeHTTP, I got a couple of questions for you.
>
> Both InvokeHTTP and GetHTTP has settings as "Concurrent Tasks" and "Run
> schedule", if my use case is only about GET method, why InvokeHTTP is
> better? I noticed that InvokeHTTP inherits from AbstractProcessor, while
> AbstractProcessor and GetHTTP share the same parent,
> AbstractSessionFactoryProcessor, can you explain what enhancement
> InvokeHTTP gets by going one further down the hierarchy?
>
> ControlRate and InvokeHTTP are at the same level regarding class
> hierarchy. Simon pointed out that ControlRate and InvokeHTTP can work
> together for more intuitive control. This looks wonderful. Both of you
> mentioned backpressure, what's the different regarding backpressure when
> using InvokeHTTP alone and combine it with ControlRate?
>
> Can ControlRate work with GetHTTP also? If yes, what would be the
> different?
>
> Thanks,
> Jeff
>
> On Fri, Feb 12, 2016 at 7:06 PM, Simon Ball <[email protected]> wrote:
>
>> Jeff,
>>
>> Another approach I've used with some success is the ControlRate processor
>> before the InvokeHttp, which gives you an intuitive way of limiting the
>> number of requests in a specified time interval. Note however that this
>> does need to be combined with back pressure control to prevent requests
>> queuing behind the InvokeHttp. Feeding failure retries back into the
>> ControlRate, or a funnel before the invoke also gives you a bit more
>> control over groupings of retries for example.
>>
>> Simon
>>
>> Sent from my iPhone
>>
>> —
>> Simon Elliston Ball
>> Solutions Engineer - EMEA
>> +44 7930 424111 <+44%207930%20424111>
>> Hortonworks - We Do Hadoop
>>
>>
>> On 12 Feb 2016, at 05:34, Joe Witt <[email protected]> wrote:
>>
>> Jeff,
>>
>> This is definitely a strong use case for nifi.
>>
>> It might be that InvokeHTTP is the better choice here.
>>
>> If what you'd like to do is effectively throttle the rate at which you
>> hit the web service with the InvokeHttp calls you can schedule that
>> processor to run as often as you like (for example every 100 ms).
>> Then use backpressure settings on the queues feeding that InvokeHTTP
>> process.  Effectively you can control where data will back up in the
>> flow while it is being throttled.
>>
>> If the lookup data is a good candidate for caching then there may be
>> other great options to make this more efficient.
>>
>> Perhaps you can share a flow template of what you have so far and we
>> can make recommendations on next steps?
>>
>> Thanks
>> Joe
>>
>> On Fri, Feb 12, 2016 at 12:28 AM, Jeff - Data Bean Australia
>> <[email protected]> wrote:
>>
>> Hi
>>
>>
>> I got a use case like this:
>>
>>
>> There is a file that contains thousands of items, each on one line. For
>> each
>>
>> item, it will trigger one GetHTTP processor to fetch some data.
>>
>>
>> Here is what I am trying to do:
>>
>>
>> 1. Fetch this file
>>
>> 2. For each line, I generate one file using SplitText
>>
>> 3. Drive GetHTTP downstream.
>>
>>
>> However, given there are more than 2000 lines, more than 2000 HTTP Get
>>
>> processes will be created and flooded into one web site, which doesn't
>> sound
>>
>> like a good idea. So I would like to control the processors, so that only
>> a
>>
>> couple of them will be running at the same time, and maybe delay for a
>>
>> couple of seconds after finish.
>>
>>
>> How can I do that in NiFi?
>>
>>
>> Thanks,
>>
>> Jeff
>>
>>
>>
>>
>>
>> --
>>
>> Data Bean - A Big Data Solution Provider in Australia.
>>
>>
>>
>
>
> --
> Data Bean - A Big Data Solution Provider in Australia.
>



-- 
Data Bean - A Big Data Solution Provider in Australia.

Reply via email to