With following script I get week number 44 and year 118, which is strange
result.
Week should be 1 and year 2019 for date 2018-31-12.
What is wrong here?
Tom
from datetime import datetime, timedelta, date
flowFile = session.get()
if (flowFile != None):
file_name = flowFile.getAttribute('filename')
date_file = file_name.split("_")[6]
date_final = date_file.split(".")[0]
date_obj = datetime.strptime(date_final,'%y%m%d')
date_year = date_obj.year
date_day = date_obj.day
date_month = date_obj.month
week_att = date(year=date_year, month=date_month,
day=date_day).isocalendar()[1]
year_att = date(year=date_year, month=date_month,
day=date_day).isocalendar()[0]
str_week = str(week_att)
str_year = str(year_att)
flowFile = session.putAttribute(flowFile, "year_extracted", str_year)
flowFile = session.putAttribute(flowFile, "week_extracted", str_week)
session.transfer(flowFile, REL_SUCCESS)
session.commit()
On Tue, 29 Jan 2019 at 16:59, Tomislav Novosel <[email protected]> wrote:
> Thank you all for answers. The reason why I want this to do with python
> script is wrong calculation of week number from date. Nifi has that
> function in expression lang. (extracted_date:format("w", <<time_zone>>)).
> My time zone is GMT+2.
> If i set date, for example 20180819, and time zone in function GMT I get
> week number 34, which is wrong. If I ommit time zone, I get week number 33,
> which is right. I'm not sure if thats bug. You can test it for yourself,
> and if you do, please share your findings here, maybe I'm doing something
> wrong.
>
> On the other side, if I use python, I'more sure that I will get correct
> week number, even for dates which overlaps with week number in next
> year(e.g. 20181231)
>
> Since this calc will be in production, I need resilient workflow in the
> future without errors.
>
> Regarding script I sent above, I'm getting error: "week cannot bo coerced
> as string". I checked right on the beginning if the session is null or not.
>
> On Tue, 29 Jan 2019, 16:26 Jerry Vinokurov <[email protected] wrote:
>
>> I wanted to add, since I've done this specific operation many times, that
>> you can really just do this via the NiFi expression language, which I think
>> is more "idiomatic" than having ExecuteScript processors all over the
>> place. Basically, you would have an UpdateAttribute that set something
>> called, say, date_extracted with an expression that looks something like
>> ${filename:substringAfterLast('_'):toDate('yyyy.MM.dd')} (this is an
>> approximation based on the above, modify as necessary for your purpose).
>> Then you could use a second UpdateAttribute to extract various information
>> from this date with the format command, e.g. ${date_extracted:format('<your
>> format expression here>')}. I don't think there's one for "week" but in
>> general this is the approach I take when I need to do date munging.
>>
>> On Tue, Jan 29, 2019 at 10:06 AM Tomislav Novosel <[email protected]>
>> wrote:
>>
>>> Hi Matt, thanks for suggestions. But performance is not crucial here.
>>> This is code i tried. but I get error: "AttributeError: 'NoneType'
>>> object has no attribute 'getAttribute' at line number 4"
>>> If I remove code from line 6 to line 14, it works with some default
>>> attribute values for year_extracted and week_extracted, otherwise i get
>>> error form above.
>>>
>>> Tom
>>>
>>> from datetime import datetime, timedelta, date
>>>
>>> flowFile = session.get()
>>> file_name = flowFile.getAttribute('filename')
>>>
>>> date_file = file_name.split("_")[6]
>>> date_final = date_file.split(".")[0]
>>> date_obj = datetime.strptime(date_final,'%y%m%d')
>>> date_year = date_obj.year
>>> date_day = date_obj.day
>>> date_month = date_obj.month
>>>
>>> week = date(year=date_year, month=date_month, day=date_day).isocalendar
>>> ()[1]
>>> year = date(year=date_year, month=date_month, day=date_day).isocalendar
>>> ()[0]
>>>
>>> if (flowFile != None):
>>> flowFile = session.putAttribute(flowFile, "year_extracted", year)
>>> flowFile = session.putAttribute(flowFile, "week_extracted", week)
>>> session.transfer(flowFile, REL_SUCCESS)
>>> session.commit()
>>>
>>> On Tue, 29 Jan 2019 at 15:53, Matt Burgess <[email protected]> wrote:
>>>
>>>> Tom,
>>>>
>>>> Keep in mind that you are using Jython not Python, which I mention
>>>> only to point out that it is *much* slower than the native Java
>>>> processors such as UpdateAttribute, and slower than other scripting
>>>> engines such as Groovy or Javascript/Nashorn.
>>>>
>>>> If performance/throughput is not a concern and you're more comfortable
>>>> with Jython, then Jerry's suggestion of session.putAttribute(flowFile,
>>>> attributeName, attributeValue) should do the trick. Note that if you
>>>> are adding more than a couple attributes, it's probably better to
>>>> create a dictionary (eventually/actually, a Java Map<String,String>)
>>>> of attribute name/value pairs, and use putAllAttributes(flowFile,
>>>> attributes) instead, as it is more performant.
>>>>
>>>> Regards,
>>>> Matt
>>>>
>>>> On Tue, Jan 29, 2019 at 9:25 AM Tomislav Novosel <[email protected]>
>>>> wrote:
>>>> >
>>>> > Thanks for the answer.
>>>> >
>>>> > Yes I know I can handle that with Expression language and
>>>> UpdateAttribute processor, but this is specific case on my work and I think
>>>> Python
>>>> > is better and more simple solution. I need to calc that with python
>>>> script.
>>>> >
>>>> > Tom
>>>> >
>>>> > On Tue, 29 Jan 2019 at 15:18, John McGinn <[email protected]>
>>>> wrote:
>>>> >>
>>>> >> Since you're script shows that "filename" is an attribute of your
>>>> flowfile, you could use the UpdateAttribute processor.
>>>> >>
>>>> >> If you right click on UpdateAttribute and choose ShowUsage, then
>>>> choose Expression Language Guide, it shows you the things you can handle.
>>>> >>
>>>> >> Something along the lines of ${filename:getDelimitedField(6,'_')},
>>>> if I understand the Groovy code correctly. I did a GenerateFlowFIle to an
>>>> UpdateAttribute processor setting filename to "1_2_3_4_5_6.2_abc", then
>>>> sent that to another UpdateAttribute with the getDelimitedField() I listed
>>>> and I received 6.2. Then another UpdateAttribute could parse the 6.2 for
>>>> the second substring, or you might be able to chain them in the existing
>>>> UpdateProcessor.
>>>> >>
>>>> >>
>>>> >> --------------------------------------------
>>>> >> On Tue, 1/29/19, Tomislav Novosel <[email protected]> wrote:
>>>> >>
>>>> >> Subject: Modify Flowfile attributes
>>>> >> To: [email protected]
>>>> >> Date: Tuesday, January 29, 2019, 9:04 AM
>>>> >>
>>>> >> Hi all,
>>>> >> I'm trying to calculate week number and date
>>>> >> from filename using ExecuteScript processor and Jython. Here
>>>> >> is python script.How can I add calculated
>>>> >> attributes week and year to flowfile?
>>>> >> Please help, thank you.Tom
>>>> >> P.S. Maybe I completely missed with this script.
>>>> >> Feel free to correct me.
>>>> >>
>>>> >> import
>>>> >> jsonimport java.iofrom org.apache.commons.io import
>>>> >> IOUtilsfrom java.nio.charset import
>>>> >> StandardCharsetsfrom org.apache.nifi.processor.io import
>>>> >> StreamCallbackfrom datetime import datetime, timedelta, date
>>>> >> class PyStreamCallback(StreamCallback):
>>>> >> def __init__(self, flowfile):
>>>> >> self.ff = flowfile
>>>> >> pass
>>>> >> def process(self, inputStream, outputStream):
>>>> >> file_name =
>>>> >> self.ff.getAttribute("filename")
>>>> >> date_file =
>>>> >> file_name.split("_")[6]
>>>> >> date_final =
>>>> >> date_file.split(".")[0]
>>>> >> date_obj =
>>>> >> datetime.strptime(date_final,'%y%m%d')
>>>> >> date_year =
>>>> >> date_obj.year
>>>> >> date_day =
>>>> >> date_obj.day
>>>> >> date_month =
>>>> >> date_obj.month
>>>> >> week = date(year=date_year, month=date_month,
>>>> day=date_day).isocalendar()[1]
>>>> >> year =
>>>> >> date(year=date_year, month=date_month,
>>>> day=date_day).isocalendar()[0]
>>>> >> flowFile =
>>>> >> session.get()if (flowFile != None):
>>>> >> session.transfer(flowFile, REL_SUCCESS)
>>>> >> session.commit()
>>>>
>>>
>>
>> --
>> http://www.google.com/profiles/grapesmoker
>>
>