Thanks for looking into this Tim! 

I just created a ticket in JIRA.
https://issues.apache.org/jira/browse/TIKA-2761

Nick

> On Oct 17, 2018, at 8:21 AM, Tim Allison <[email protected]> wrote:
> 
> Nick,
>  I'm sorry for my delay.  The XHTMLContentHandler writes everything
> that is in the Metadata object when the parser writes the first
> "content" element, and in the MP3Parser, this is the <h1> element,
> which is written before the sample rate is added to the Metadata
> object.  Any metadata that is added afterwards does not show up in the
> xhtml, but is retrievable from the Metadata object.
>  This is one of the limitations of a streaming write.  As I look at
> the code of the MP3Parser, I _think_ it would be trivial to write the
> metadata before writing any content, and it wouldn't get in the way of
> a streaming parse because the parser reads the whole file and caches
> the content as it goes -- only writing once it has finished reading
> the file.
>  Please open a ticket on our JIRA, and I'll take care of it.
> 
>          Best,
> 
>                 Tim
> On Sun, Oct 14, 2018 at 11:08 PM Nick Sincaglia <[email protected]> 
> wrote:
>> 
>> I was wondering if anyone might have some insights on why the XML output 
>> does not contain some of the technical file information that the JSON and 
>> text version does. Is this something that can be fixed? Could someone 
>> suggest a way to go about identifying the root cause and fixing it?
>> 
>> Thanks,
>> 
>> Nick
>> 
>> On Oct 8, 2018, at 9:31 PM, Nick Sincaglia <[email protected]> wrote:
>> 
>> I am using the Tika 1.19 as a GUI to extract metadata from an .mp3 file. The 
>> sample rate is available and I am able access it, but only as a string or as 
>> part of a JSON document. I am working in XML and would like to use XML as a 
>> content handler. But when the metadata is returned as ‘structured text’ 
>> (XML) the sample rate is not returned. I have tried using Tika 1.19 in a 
>> Maven project and experimented with different contentHandlers  and the same 
>> issue occurs. I cannot seem to get the sample rate returned in an XML doc, 
>> but I am able to access the data from the metadata object itself. If the 
>> metadata is returned as a string, the sample rate is there, if it is 
>> returned as XML, the sample rate is not returned. I am wondering what I am 
>> doing wrong or misunderstanding. Perhaps an issue with the parser or 
>> contentHandler that is used?
>> 
>> 
>> 
>> Tika 1.19 ‘Metadata’ view (sample rate is available):
>> 
>> 
>> 
>> Author: Glee Cast
>> Content-Length: 8251946
>> Content-Type: audio/mpeg
>> X-Parsed-By: org.apache.tika.parser.DefaultParser
>> X-Parsed-By: org.apache.tika.parser.mp3.Mp3Parser
>> X-TIKA:digest:MD5: e0bdf3a0e171fca838604f9baad46612
>> X-TIKA:digest:SHA256: 
>> ea1e4aa998f2c6e80139fa100c62fc1ee17652cf702cd484532b90183e7c5cc0
>> channels: 2
>> creator: Glee Cast
>> dc:creator: Glee Cast
>> dc:title: Rehab (Glee Cast Version)
>> meta:author: Glee Cast
>> resourceName: USQX90900223_A4_T7.mp3
>> samplerate: 44100
>> title: Rehab (Glee Cast Version)
>> version: MPEG 3 Layer III Version 1
>> xmpDM:album: Glee: The Music, The Complete Season One
>> xmpDM:artist: Glee Cast
>> xmpDM:audioChannelType: Stereo
>> xmpDM:audioCompressor: MP3
>> xmpDM:audioSampleRate: 44100
>> xmpDM:duration: 206301.296875
>> xmpDM:genre:
>> xmpDM:logComment: XXX -
>> (P) 2009 Twentieth Century Fox Television - USQX90900223
>> xmpDM:releaseDate:
>> xmpDM:trackNumber: 4
>> 
>> 
>> 
>> 
>> 
>> Tika 1.19 ‘Structured Text’ view (no sample rate):
>> 
>> 
>> 
>> <?xml version="1.0" encoding="UTF-8"?><html 
>> xmlns="http://www.w3.org/1999/xhtml";>
>> <head>
>> <meta name="xmpDM:genre" content=""/>
>> <meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser"/>
>> <meta name="X-Parsed-By" content="org.apache.tika.parser.mp3.Mp3Parser"/>
>> <meta name="creator" content="Glee Cast"/>
>> <meta name="xmpDM:album" content="Glee: The Music, The Complete Season One"/>
>> <meta name="xmpDM:releaseDate" content=""/>
>> <meta name="meta:author" content="Glee Cast"/>
>> <meta name="xmpDM:artist" content="Glee Cast"/>
>> <meta name="X-TIKA:digest:SHA256" 
>> content="ea1e4aa998f2c6e80139fa100c62fc1ee17652cf702cd484532b90183e7c5cc0"/>
>> <meta name="dc:creator" content="Glee Cast"/>
>> <meta name="xmpDM:audioCompressor" content="MP3"/>
>> <meta name="resourceName" content="USQX90900223_A4_T7.mp3"/>
>> <meta name="xmpDM:logComment" content="XXX - &#10;(P) 2009 Twentieth Century 
>> Fox Television - USQX90900223"/>
>> <meta name="dc:title" content="Rehab (Glee Cast Version)"/>
>> <meta name="Author" content="Glee Cast"/>
>> <meta name="Content-Length" content="8251946"/>
>> <meta name="X-TIKA:digest:MD5" content="e0bdf3a0e171fca838604f9baad46612"/>
>> <meta name="Content-Type" content="audio/mpeg"/>
>> <title>Rehab (Glee Cast Version)</title>
>> </head>
>> <body><h1>Rehab (Glee Cast Version)</h1>
>> <p>Glee Cast</p>
>> <p>Glee: The Music, The Complete Season One, track 4</p>
>> <p>206301.3</p>
>> <p>XXX -  (P) 2009 Twentieth Century Fox Television - USQX90900223</p>
>> </body></html>
>> 
>> 
>> 
>> Tika 1.19 Recursive JSON view (the sample rate is there):
>> 
>> 
>> 
>> [
>>  {
>>    "Author": "Glee Cast",
>>    "Content-Type": "audio/mpeg",
>>    "X-Parsed-By": [
>>      "org.apache.tika.parser.DefaultParser",
>>      "org.apache.tika.parser.mp3.Mp3Parser"
>>    ],
>>    "X-TIKA:content": "Rehab (Glee Cast Version)\nGlee Cast\nGlee: The Music, 
>> The Complete Season One, track 4\n206301.3\nXXX - \n(P) 2009 Twentieth 
>> Century Fox Television - USQX90900223\n",
>>    "X-TIKA:digest:MD5": "e0bdf3a0e171fca838604f9baad46612",
>>    "X-TIKA:digest:SHA256": 
>> "ea1e4aa998f2c6e80139fa100c62fc1ee17652cf702cd484532b90183e7c5cc0",
>>    "X-TIKA:parse_time_millis": "86",
>>    "channels": "2",
>>    "creator": "Glee Cast",
>>    "dc:creator": "Glee Cast",
>>    "dc:title": "Rehab (Glee Cast Version)",
>>    "meta:author": "Glee Cast",
>>    "samplerate": "44100",
>>    "title": "Rehab (Glee Cast Version)",
>>    "version": "MPEG 3 Layer III Version 1",
>>    "xmpDM:album": "Glee: The Music, The Complete Season One",
>>    "xmpDM:artist": "Glee Cast",
>>    "xmpDM:audioChannelType": "Stereo",
>>    "xmpDM:audioCompressor": "MP3",
>>    "xmpDM:audioSampleRate": "44100",
>>    "xmpDM:duration": "206301.296875",
>>    "xmpDM:genre": "",
>>    "xmpDM:logComment": "XXX - \n(P) 2009 Twentieth Century Fox Television - 
>> USQX90900223",
>>    "xmpDM:releaseDate": "",
>>    "xmpDM:trackNumber": "4"
>>  }
>> ]
>> 
>> 
>> 


Reply via email to