Thank you,Joe. CompressContent works for me. I choose ‘decompress’ mode and ‘gzip’ compression format then it works like a charm!
[cid:[email protected]] Roland. From: Joe Witt [mailto:[email protected]] Sent: Wednesday, August 12, 2015 10:38 AM To: [email protected] Subject: Re: UnpackContent processor cannot unpack gz file Hello The UnpackContent is for dealing with archive formats (tar, zip, etc..). If your file is a compression format (as is the case with the part-0002.gz file) then you first need to run it through 'CompressContent' in 'decompress' mode. You can even first run it through 'IdentifyMimeType' and set up a flow to handle arbitrarily complicated layers of compression/archive structures. So for this case: - GetHDFS (or ListHDFS and FetchHDFS) - CompressContent (in decompress mode) Now you have your text oriented file ready to be dealt with. If you perhaps want to deal with each line individually you can use - SplitText (line split count of 1) Thanks Joe On Tue, Aug 11, 2015 at 9:27 PM, 彭光裕 <[email protected]<mailto:[email protected]>> wrote: [cid:[email protected]] hi, I have a compressed file got from GetHDFS processor and to be unpacked by using UnpackContent processor, I have already set the UnpackContent processor property packaging format to 'tar', but an error like below always takes place. The error logs is attached below (Unable to unpack StandardFlowFileRecord) 2015-08-11 07:10:52,291 ERROR [Timer-Driven Process Thread-4] o.a.n.processors.standard.UnpackContent UnpackContent[id=b90c65e1-b97f-3b4b-9e37-6223afa1ef03] Unable to unpack StandardFlowFileRecord[uuid=85b7d53b-3183-4c48-9160-b2e714b5eaa8,claim=1439248247840-1,offset=0,name=part-00002.gz,size=59212170] due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from UnpackContent[id=b90c65e1-b97f-3b4b-9e37-6223afa1ef03]: java.io.IOException: Error detected parsing the header; routing to failure: org.apache.nifi.processor.exception.ProcessException: IOException thrown from UnpackContent[id=b90c65e1-b97f-3b4b-9e37-6223afa1ef03]: java.io.IOException: Error detected parsing the header My compressed file is named part-00002.gz, and you can access the file here: https://dl.dropboxusercontent.com/u/24808937/part-00002.gz Any advice would be welcome. Please help how to solve this problem, thank you! Roland 本信件可能包含中華電信股份有限公司機密資訊,非指定之收件者,請勿蒐集、處理或利用本信件內容,並請銷毀此信件. 如為指定收件者,應確實保護郵件中本公司之營業機密及個人資料,不得任意傳佈或揭露,並應自行確認本郵件之附檔與超連結之安全性,以共同善盡資訊安全與個資保護責任. Please be advised that this email message (including any attachments) contains confidential information and may be legally privileged. If you are not the intended recipient, please destroy this message and all attachments from your system and do not further collect, process, or use them. Chunghwa Telecom and all its subsidiaries and associated companies shall not be liable for the improper or incomplete transmission of the information contained in this email nor for any delay in its receipt or damage to your system. If you are the intended recipient, please protect the confidential and/or personal information contained in this email with due care. Any unauthorized use, disclosure or distribution of this message in whole or in part is strictly prohibited. Also, please self-inspect attachments and hyperlinks contained in this email to ensure the information security and to protect personal information.
