Hi all, I'm observing some slightly unusual behaviour with my flow and wanted to run a possible explanation past the list. I'm using NiFi to scrape a website consisting of nested data
e.g. GET http://server/2018/16/11/ returns a webpage full of links to today's data I'm using a combination of InvokeHTTP (to traverse the hierarchy) and GetHTMLElement (to extract file and directory links), starting at the root i.e. http://server/, then walking the years, months, days etc. I'm generating the Remote URLs as ${invokehttp.request.url}${HTMLElement} where invokehttp.request.url is the URL previously fetched for the day listing in the hierarchy, and HTMLElement is the link to the file extracted by GetHTMLElement. Finally, I've routed "retry" and "failure" back to the InvokeHTTP processor since my network is quite flaky. Mostly everything is ok, but sometimes I manage to generate URLs which look a bit like this: http://server/2018/16/11/filename.jsonfilename.json i.e. the filename part of the URL is duplicated My thesis is that this is occurring when there is a network issue, so the flowfile is routed to retry, then the InvokeHTTP processor re-evaluates the expression for the Remote URL which leads to the duplication of the filename (since invokehttp.request.url will have been updated by the failed request). Does this sound feasible? My proposed fix for my flow is to use a single attribute for the URL and UpdateAttribute before InvokeHTTP to set this, so that any retries don't munge the URL. Many thanks, hope this makes sense. James
