Jim,

Regarding format support, both the Processor and the new ExcelReader are
limited to the XLSX format and do not support the older binary XLS format.
The code required to support XLS has substantial differences from the code
for XLSX. The older format could be support through the Apache POI library,
but it has not been implemented. Feel free to file an Apache NiFi Jira
issue requesting general support for XLS. It would be helpful to describe
the use case, since the XLSX format dates back to Excel 2007.

As Joe noted, the newer ExcelReader should be preferred over the Convert
Processor, which probably should be deprecated for removal in the next
major release version.

Regards,
David Handermann

On Mon, Sep 25, 2023, 8:24 AM Joe Witt <[email protected]> wrote:

> Jim
>
> I don't really recall the history of that specific processor but what it
> can handle is just a function of what it was coded for the libraries it
> uses.  I'm sure older format libraries required some other library.  That
> said I think we should consider removing that component in the 2.x and
> instead favor the ExcelReader [1].  It has the same noted limitation but
> I'm sure that can be addressed.
>
> [1]
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-poi-nar/1.23.2/org.apache.nifi.excel.ExcelReader/index.html
>
> Thanks
>
> On Mon, Sep 25, 2023 at 4:02 AM James McMahon <[email protected]>
> wrote:
>
>> A couple of random questions about ConvertExcelToCSVProcessor:
>>
>> Why does this processor only handle the xlsx Excel file format?  From the
>> Description for ConvertExcelToCSVProcessor:  "*This processor is
>> currently only capable of processing .xlsx (XSSF 2007 OOXML file format)
>> Excel documents and not older .xls (HSSF '97(-2007) file format) documents.*"
>> I ask because it seems unfortunate to have to develop a separate distinct
>> flow path to handle the .xls files that this native processor cannot. Why
>> was it that handling of xls Excel files was not baked into
>> ConvertExcelToCSVProcessor too? Do later releases lift this limitation?
>>
>> What is it about this processor that required including the word
>> Processor in its name? It seems redundant and inconsistent with the naming
>> convention used for the majority of the other processors. I figure there
>> was an interesting reason behind this, and so wanted to ask.
>>
>> I am using a slightly older version of NiFi. Does this limitation go away
>> in later versions?
>>
>> On Mon, Sep 25, 2023 at 3:23 AM Chris Sampson <[email protected]>
>> wrote:
>>
>>> I completely missed the fact that this was an external python conversion
>>> script through the ExecuteStreamCommand, but as Matt says, that will be
>>> catered for in the new NiFi versions.
>>>
>>> From a quick look, although I've not tested to confirm, it appears both
>>> the existing ConvertExcelToCSVProcessor and CSVRecordSetWriter (which can
>>> now be paired with the relatively new ExcelReader,  e.g. in a ConvertRecord
>>> processor) will both set the result flowfile's mime.type attribute as
>>> text/csv, which would allow the expected downstream content viewer
>>> behaviour.
>>>
>>> On Mon, 25 Sept 2023, 06:54 Matt Burgess, <[email protected]> wrote:
>>>
>>>> I added MIME Type properties to ExecuteProcess and ExecuteStream
>>>> command so you can set it explicitly if you want [1]. They will be in the
>>>> 1.24.0 and 2.0 releases.
>>>>
>>>> Regards,
>>>> Matt
>>>>
>>>> [1] https://issues.apache.org/jira/browse/NIFI-12011
>>>>
>>>>
>>>> On Mon, Sep 25, 2023 at 1:41 AM Joe Witt <[email protected]> wrote:
>>>>
>>>>>  Chris
>>>>>
>>>>> Yep. Though this case was ExecuteStreamCommand so following with
>>>>> UpdateAttr as you mention or IdentifyMimeType would do the trick.
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Sun, Sep 24, 2023 at 10:30 PM Chris Sampson <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> An UpdateAttribute could also be used to update the mime.type, e.g.
>>>>>> to text/csv.
>>>>>>
>>>>>> I'd think the csv record writer should probably do this automatically
>>>>>> though, so maybe worth a jira to correct that (I'm reasonably sure the
>>>>>> existing json and avro writers do that, for example).
>>>>>>
>>>>>> On Sun, 24 Sept 2023, 23:52 James McMahon, <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> That was it. I was missing the forest for the trees, yet again
>>>>>>> <lol>. I do all the hard work and then forget to IdentifyMimeType at the
>>>>>>> end.
>>>>>>> Thanks very much Joe.
>>>>>>> Jim
>>>>>>>
>>>>>>> On Sun, Sep 24, 2023 at 6:30 PM Joe Witt <[email protected]> wrote:
>>>>>>>
>>>>>>>> Jim,
>>>>>>>>
>>>>>>>> Before you try to view it you can likely run it through
>>>>>>>> IdentifyMimeType.  As you note the conversion from XLS to CSV happens 
>>>>>>>> but
>>>>>>>> we still see a mime type of 'application/vnd.
>>>>>>>> openxmlformats-officedocument.spreadsheetml.sheet' so that is
>>>>>>>> likely causing it to not even attempt to display.  So after your python
>>>>>>>> script execution run the data through IdentifyMimeType then you can 
>>>>>>>> likely
>>>>>>>> view it just fine.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> On Sun, Sep 24, 2023 at 3:21 PM James McMahon <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I sure can Joe. Here they are:
>>>>>>>>>
>>>>>>>>> RouteOnAttribute.Route
>>>>>>>>> isExcel
>>>>>>>>> execution.command
>>>>>>>>> /usr/bin/python3
>>>>>>>>> execution.command.args
>>>>>>>>> /opt/nifi/config_resources/scripts/excelToCSV.py
>>>>>>>>> execution.error
>>>>>>>>> Empty string set
>>>>>>>>> execution.status
>>>>>>>>> 0
>>>>>>>>> filename
>>>>>>>>> Alltables.csv
>>>>>>>>> hash.value.md5
>>>>>>>>> b48840c161b645a0169e622dcb8f5083
>>>>>>>>> hash.value.sha256
>>>>>>>>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
>>>>>>>>> isChild
>>>>>>>>> false
>>>>>>>>> mime.extension
>>>>>>>>> .xlsx
>>>>>>>>> mime.type
>>>>>>>>> application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
>>>>>>>>> parent.MD5
>>>>>>>>> b48840c161b645a0169e622dcb8f5083
>>>>>>>>> parent.SHA256
>>>>>>>>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
>>>>>>>>> path
>>>>>>>>> ./
>>>>>>>>> s3.bucket
>>>>>>>>> rampart-raw-data
>>>>>>>>> s3.encryptionStrategy
>>>>>>>>> SSE_S3
>>>>>>>>> s3.etag
>>>>>>>>> b48840c161b645a0169e622dcb8f5083
>>>>>>>>> s3.isLatest
>>>>>>>>> true
>>>>>>>>> s3.lastModified
>>>>>>>>> 1672701227000
>>>>>>>>> s3.length
>>>>>>>>> 830934
>>>>>>>>> s3.owner
>>>>>>>>> b34a7aa80a4130503fee2e8d4c2b674e154af3c4db69db9a4e3bff8a47cc92d1
>>>>>>>>> s3.sseAlgorithm
>>>>>>>>> AES256
>>>>>>>>> s3.storeClass
>>>>>>>>> STANDARD
>>>>>>>>> s3.version
>>>>>>>>> null
>>>>>>>>> sourcing.MD5
>>>>>>>>> b48840c161b645a0169e622dcb8f5083
>>>>>>>>> sourcing.SHA256
>>>>>>>>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
>>>>>>>>> sourcing.sourceMD5
>>>>>>>>> b48840c161b645a0169e622dcb8f5083
>>>>>>>>> sourcing.sourceSHA256
>>>>>>>>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
>>>>>>>>> triage.datatype
>>>>>>>>> excel
>>>>>>>>> uuid
>>>>>>>>> d72ec2e9-cfbd-435e-9954-4f7fae55c550
>>>>>>>>>
>>>>>>>>> Thanks for any help. Perhaps my data is there but I simply can't
>>>>>>>>> render it in the Viewer?
>>>>>>>>> Jim
>>>>>>>>>
>>>>>>>>> On Sun, Sep 24, 2023 at 6:08 PM Joe Witt <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Jim,
>>>>>>>>>>
>>>>>>>>>> If a content type attribute exists and is not a type NiFi
>>>>>>>>>> understands it will not be able to render it.  Can you show what 
>>>>>>>>>> flowfile
>>>>>>>>>> attributes are present at the point you attempt to view it?
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>> On Sun, Sep 24, 2023 at 3:03 PM James McMahon <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello. I have converted incoming Excel files to csv. I'd like to
>>>>>>>>>>> look at the result, but when I select my flowfiles from the output 
>>>>>>>>>>> queue, I
>>>>>>>>>>> can only select "View as hex" - but I cannot get the display to 
>>>>>>>>>>> show me the
>>>>>>>>>>> records in the form I expect. Viewing them using the hex display is 
>>>>>>>>>>> not
>>>>>>>>>>> helpful.
>>>>>>>>>>>
>>>>>>>>>>> How can I fix this viewing issue?
>>>>>>>>>>>
>>>>>>>>>>> Here is an example of what I can see:
>>>>>>>>>>>
>>>>>>>>>>> 0x00000000 22 54 61 62 6C 65 20 31 2E 20 20 45 73 74 69 6D "Table
>>>>>>>>>>> 1. Estim
>>>>>>>>>>> 0x00000010 61 74 65 64 20 4D 6F 6E 74 68 6C 79 20 53 61 6C ated
>>>>>>>>>>> Monthly Sal
>>>>>>>>>>> 0x00000020 65 73 20 61 6E 64 20 49 6E 76 65 6E 74 6F 72 69 es
>>>>>>>>>>> and Inventori
>>>>>>>>>>> 0x00000030 65 73 20 66 6F 72 20 4D 61 6E 75 66 61 63 74 75 es
>>>>>>>>>>> for Manufactu
>>>>>>>>>>> 0x00000040 72 65 72 73 2C 20 52 65 74 61 69 6C 65 72 73 2C rers,
>>>>>>>>>>> Retailers,
>>>>>>>>>>> 0x00000050 20 61 6E 64 20 4D 65 72 63 68 61 6E 74 20 57 68 and
>>>>>>>>>>> Merchant Wh
>>>>>>>>>>> 0x00000060 6F 6C 65 73 61 6C 65 72 73 22 2C 55 6E 6E 61 6D
>>>>>>>>>>> olesalers",Unnam
>>>>>>>>>>> 0x00000070 65 64 3A 20 31 2C 55 6E 6E 61 6D 65 64 3A 20 32 ed:
>>>>>>>>>>> 1,Unnamed: 2
>>>>>>>>>>> 0x00000080 2C 55 6E 6E 61 6D 65 64 3A 20 33 2C 55 6E 6E 61 ,Unnamed:
>>>>>>>>>>> 3,Unna
>>>>>>>>>>> 0x00000090 6D 65 64 3A 20 34 2C 55 6E 6E 61 6D 65 64 3A 20 med:
>>>>>>>>>>> 4,Unnamed:
>>>>>>>>>>> 0x000000A0 35 2C
>>>>>>>>>>>
>>>>>>>>>>

Reply via email to