Jim

I don't really recall the history of that specific processor but what it
can handle is just a function of what it was coded for the libraries it
uses.  I'm sure older format libraries required some other library.  That
said I think we should consider removing that component in the 2.x and
instead favor the ExcelReader [1].  It has the same noted limitation but
I'm sure that can be addressed.

[1]
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-poi-nar/1.23.2/org.apache.nifi.excel.ExcelReader/index.html

Thanks

On Mon, Sep 25, 2023 at 4:02 AM James McMahon <[email protected]> wrote:

> A couple of random questions about ConvertExcelToCSVProcessor:
>
> Why does this processor only handle the xlsx Excel file format?  From the
> Description for ConvertExcelToCSVProcessor:  "*This processor is
> currently only capable of processing .xlsx (XSSF 2007 OOXML file format)
> Excel documents and not older .xls (HSSF '97(-2007) file format) documents.*"
> I ask because it seems unfortunate to have to develop a separate distinct
> flow path to handle the .xls files that this native processor cannot. Why
> was it that handling of xls Excel files was not baked into
> ConvertExcelToCSVProcessor too? Do later releases lift this limitation?
>
> What is it about this processor that required including the word Processor
> in its name? It seems redundant and inconsistent with the naming convention
> used for the majority of the other processors. I figure there was an
> interesting reason behind this, and so wanted to ask.
>
> I am using a slightly older version of NiFi. Does this limitation go away
> in later versions?
>
> On Mon, Sep 25, 2023 at 3:23 AM Chris Sampson <[email protected]>
> wrote:
>
>> I completely missed the fact that this was an external python conversion
>> script through the ExecuteStreamCommand, but as Matt says, that will be
>> catered for in the new NiFi versions.
>>
>> From a quick look, although I've not tested to confirm, it appears both
>> the existing ConvertExcelToCSVProcessor and CSVRecordSetWriter (which can
>> now be paired with the relatively new ExcelReader,  e.g. in a ConvertRecord
>> processor) will both set the result flowfile's mime.type attribute as
>> text/csv, which would allow the expected downstream content viewer
>> behaviour.
>>
>> On Mon, 25 Sept 2023, 06:54 Matt Burgess, <[email protected]> wrote:
>>
>>> I added MIME Type properties to ExecuteProcess and ExecuteStream command
>>> so you can set it explicitly if you want [1]. They will be in the 1.24.0
>>> and 2.0 releases.
>>>
>>> Regards,
>>> Matt
>>>
>>> [1] https://issues.apache.org/jira/browse/NIFI-12011
>>>
>>>
>>> On Mon, Sep 25, 2023 at 1:41 AM Joe Witt <[email protected]> wrote:
>>>
>>>>  Chris
>>>>
>>>> Yep. Though this case was ExecuteStreamCommand so following with
>>>> UpdateAttr as you mention or IdentifyMimeType would do the trick.
>>>>
>>>> Thanks
>>>>
>>>> On Sun, Sep 24, 2023 at 10:30 PM Chris Sampson <
>>>> [email protected]> wrote:
>>>>
>>>>> An UpdateAttribute could also be used to update the mime.type, e.g. to
>>>>> text/csv.
>>>>>
>>>>> I'd think the csv record writer should probably do this automatically
>>>>> though, so maybe worth a jira to correct that (I'm reasonably sure the
>>>>> existing json and avro writers do that, for example).
>>>>>
>>>>> On Sun, 24 Sept 2023, 23:52 James McMahon, <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> That was it. I was missing the forest for the trees, yet again <lol>.
>>>>>> I do all the hard work and then forget to IdentifyMimeType at the end.
>>>>>> Thanks very much Joe.
>>>>>> Jim
>>>>>>
>>>>>> On Sun, Sep 24, 2023 at 6:30 PM Joe Witt <[email protected]> wrote:
>>>>>>
>>>>>>> Jim,
>>>>>>>
>>>>>>> Before you try to view it you can likely run it through
>>>>>>> IdentifyMimeType.  As you note the conversion from XLS to CSV happens 
>>>>>>> but
>>>>>>> we still see a mime type of 'application/vnd.
>>>>>>> openxmlformats-officedocument.spreadsheetml.sheet' so that is
>>>>>>> likely causing it to not even attempt to display.  So after your python
>>>>>>> script execution run the data through IdentifyMimeType then you can 
>>>>>>> likely
>>>>>>> view it just fine.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> On Sun, Sep 24, 2023 at 3:21 PM James McMahon <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I sure can Joe. Here they are:
>>>>>>>>
>>>>>>>> RouteOnAttribute.Route
>>>>>>>> isExcel
>>>>>>>> execution.command
>>>>>>>> /usr/bin/python3
>>>>>>>> execution.command.args
>>>>>>>> /opt/nifi/config_resources/scripts/excelToCSV.py
>>>>>>>> execution.error
>>>>>>>> Empty string set
>>>>>>>> execution.status
>>>>>>>> 0
>>>>>>>> filename
>>>>>>>> Alltables.csv
>>>>>>>> hash.value.md5
>>>>>>>> b48840c161b645a0169e622dcb8f5083
>>>>>>>> hash.value.sha256
>>>>>>>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
>>>>>>>> isChild
>>>>>>>> false
>>>>>>>> mime.extension
>>>>>>>> .xlsx
>>>>>>>> mime.type
>>>>>>>> application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
>>>>>>>> parent.MD5
>>>>>>>> b48840c161b645a0169e622dcb8f5083
>>>>>>>> parent.SHA256
>>>>>>>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
>>>>>>>> path
>>>>>>>> ./
>>>>>>>> s3.bucket
>>>>>>>> rampart-raw-data
>>>>>>>> s3.encryptionStrategy
>>>>>>>> SSE_S3
>>>>>>>> s3.etag
>>>>>>>> b48840c161b645a0169e622dcb8f5083
>>>>>>>> s3.isLatest
>>>>>>>> true
>>>>>>>> s3.lastModified
>>>>>>>> 1672701227000
>>>>>>>> s3.length
>>>>>>>> 830934
>>>>>>>> s3.owner
>>>>>>>> b34a7aa80a4130503fee2e8d4c2b674e154af3c4db69db9a4e3bff8a47cc92d1
>>>>>>>> s3.sseAlgorithm
>>>>>>>> AES256
>>>>>>>> s3.storeClass
>>>>>>>> STANDARD
>>>>>>>> s3.version
>>>>>>>> null
>>>>>>>> sourcing.MD5
>>>>>>>> b48840c161b645a0169e622dcb8f5083
>>>>>>>> sourcing.SHA256
>>>>>>>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
>>>>>>>> sourcing.sourceMD5
>>>>>>>> b48840c161b645a0169e622dcb8f5083
>>>>>>>> sourcing.sourceSHA256
>>>>>>>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
>>>>>>>> triage.datatype
>>>>>>>> excel
>>>>>>>> uuid
>>>>>>>> d72ec2e9-cfbd-435e-9954-4f7fae55c550
>>>>>>>>
>>>>>>>> Thanks for any help. Perhaps my data is there but I simply can't
>>>>>>>> render it in the Viewer?
>>>>>>>> Jim
>>>>>>>>
>>>>>>>> On Sun, Sep 24, 2023 at 6:08 PM Joe Witt <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Jim,
>>>>>>>>>
>>>>>>>>> If a content type attribute exists and is not a type NiFi
>>>>>>>>> understands it will not be able to render it.  Can you show what 
>>>>>>>>> flowfile
>>>>>>>>> attributes are present at the point you attempt to view it?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> On Sun, Sep 24, 2023 at 3:03 PM James McMahon <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hello. I have converted incoming Excel files to csv. I'd like to
>>>>>>>>>> look at the result, but when I select my flowfiles from the output 
>>>>>>>>>> queue, I
>>>>>>>>>> can only select "View as hex" - but I cannot get the display to show 
>>>>>>>>>> me the
>>>>>>>>>> records in the form I expect. Viewing them using the hex display is 
>>>>>>>>>> not
>>>>>>>>>> helpful.
>>>>>>>>>>
>>>>>>>>>> How can I fix this viewing issue?
>>>>>>>>>>
>>>>>>>>>> Here is an example of what I can see:
>>>>>>>>>>
>>>>>>>>>> 0x00000000 22 54 61 62 6C 65 20 31 2E 20 20 45 73 74 69 6D "Table
>>>>>>>>>> 1. Estim
>>>>>>>>>> 0x00000010 61 74 65 64 20 4D 6F 6E 74 68 6C 79 20 53 61 6C ated
>>>>>>>>>> Monthly Sal
>>>>>>>>>> 0x00000020 65 73 20 61 6E 64 20 49 6E 76 65 6E 74 6F 72 69 es
>>>>>>>>>> and Inventori
>>>>>>>>>> 0x00000030 65 73 20 66 6F 72 20 4D 61 6E 75 66 61 63 74 75 es
>>>>>>>>>> for Manufactu
>>>>>>>>>> 0x00000040 72 65 72 73 2C 20 52 65 74 61 69 6C 65 72 73 2C rers,
>>>>>>>>>> Retailers,
>>>>>>>>>> 0x00000050 20 61 6E 64 20 4D 65 72 63 68 61 6E 74 20 57 68 and
>>>>>>>>>> Merchant Wh
>>>>>>>>>> 0x00000060 6F 6C 65 73 61 6C 65 72 73 22 2C 55 6E 6E 61 6D
>>>>>>>>>> olesalers",Unnam
>>>>>>>>>> 0x00000070 65 64 3A 20 31 2C 55 6E 6E 61 6D 65 64 3A 20 32 ed:
>>>>>>>>>> 1,Unnamed: 2
>>>>>>>>>> 0x00000080 2C 55 6E 6E 61 6D 65 64 3A 20 33 2C 55 6E 6E 61 ,Unnamed:
>>>>>>>>>> 3,Unna
>>>>>>>>>> 0x00000090 6D 65 64 3A 20 34 2C 55 6E 6E 61 6D 65 64 3A 20 med:
>>>>>>>>>> 4,Unnamed:
>>>>>>>>>> 0x000000A0 35 2C 55 6E 6E 61 6D 65 64 3A 20 36 2C 55 6E 6E 5,Unnamed:
>>>>>>>>>> 6,Unn
>>>>>>>>>> 0x000000B0 61 6D 65 64 3A
>>>>>>>>>>
>>>>>>>>>

Reply via email to