ID3v2 would be great - it appears ID3v1 is widely used in music MP3 files, but 
not in Podcast MP3s.

Anyway, if anyone is having a similar problem here's some code which appears to 
work using Apache HttpClient.

Http Range requests for MP3 metadata:

                HttpClient httpClient = new HttpClient();
                
httpClient.getHttpConnectionManager().getParams().setConnectionTimeout(10000);
                
httpClient.getHttpConnectionManager().getParams().setSoTimeout(10000);

                String address = "http://address of mp3 file here";

                HttpMethod method = new HeadMethod();
                method.setURI(new URI(address,true));

                Header contentLengthHeader = null;
                Header acceptHeader = null;

                httpClient.executeMethod(method);
                try {
                        
//System.out.println(Arrays.toString(method.getResponseHeaders()));
                        contentLengthHeader = 
method.getResponseHeader("Content-Length");
                        acceptHeader = 
method.getResponseHeader("Accept-Ranges");
                } finally {
                        method.releaseConnection();
                }

                if ((contentLengthHeader != null) && (acceptHeader != null) && 
"bytes".equals(acceptHeader.getValue())) {
                        long contentLength = 
Long.parseLong(contentLengthHeader.getValue());
                        long metaDataStartRange = contentLength - 128;
                        if (metaDataStartRange > 0) {
                                method = new GetMethod();
                                method.setURI(new URI(address,true));
                                method.addRequestHeader("Range", "bytes=" + 
metaDataStartRange + "-" + contentLength);
                                
System.out.println(Arrays.toString(method.getRequestHeaders()));
                                httpClient.executeMethod(method);
                                try {
                                        Parser parser = new AutoDetectParser();

                                        Metadata metadata = new Metadata();
                                        
metadata.set(Metadata.RESOURCE_NAME_KEY, address);
                                        InputStream stream = 
method.getResponseBodyAsStream();
                                        try {
                                                parser.parse(stream, new 
DefaultHandler(), metadata);
                                        } catch (Exception e) {
                                                e.printStackTrace();
                                        } finally {
                                                stream.close();
                                        }
                                        
System.out.println(Arrays.toString(metadata.names()));
                                        System.out.println("Title: " + 
metadata.get("title"));
                                        System.out.println("Author: " + 
metadata.get("Author"));
                                } finally {
                                        method.releaseConnection();
                                }
                        }
                } else {
                        System.err.println("Range not supported. Headers were: 
");
                        
System.err.println(Arrays.toString(method.getResponseHeaders()));
                }


-----Original Message-----
From: Jonathan Koren [mailto:jonat...@soe.ucsc.edu]
Sent: Thursday, 19 February 2009 8:44 AM
To: tika-dev@lucene.apache.org
Subject: Re: Reading metadata without downloading entire file

id3v1 is exactly 128 bytes [ http://en.wikipedia.org/wiki/ID3#Layout ]
In my copious free time, I might add id3v2 support, unless of course
some else does.

On Feb 18, 2009, at 2:04 PM, Nick Lothian wrote:

> Well that would explain it then!
>
> Has anyone had any experience with using http-range requests for the
> metadata? How many bytes from the end does the metadata start?
>
> Nick
>
> -----Original Message-----
> From: Jonathan Koren [mailto:jonat...@soe.ucsc.edu]
> Sent: Wednesday, 18 February 2009 5:30 PM
> To: tika-dev@lucene.apache.org
> Subject: Re: Reading metadata without downloading entire file
>
>
> You're closing the stream before the metadata arrives.
>
> Tika supports ID3v1 which is at the end of the file, not the
> beginning.
>
> On Feb 17, 2009, at 10:22 PM, Nick Lothian wrote:
>
>> I'm trying to get MP3 Metadata without downloading an entire MP3.
>>
>> I've setup a FilterInputStream which throws an
>> InterruptedIOException after a given amount of a file is downloaded.
>>
>> If I point this at an HTML page it works - I can get the title from
>> the metadata.
>>
>> If I point it at an MP3 file it doesn't give me any metadata at all
>> (except the Metadata.RESOURCE_NAME_KEY which I set), even if I set
>> the download length to be just less than the length of the file. If
>> I download the whole file it works
>>
>> (JPGs don't seem to work either)
>>
>> Why is this so? My understanding was that Tika would work with
>> streams?
>
>
>
> --
> Jonathan Koren
> jonat...@soe.ucsc.edu
> http://www.soe.ucsc.edu/~jonathan/
>
>
>
> IMPORTANT: This e-mail, including any attachments, may contain
> private or confidential information. If you think you may not be the
> intended recipient, or if you have received this e-mail in error,
> please contact the sender immediately and delete all copies of this
> e-mail. If you are not the intended recipient, you must not
> reproduce any part of this e-mail or disclose its contents to any
> other party. This email represents the views of the individual
> sender, which do not necessarily reflect those of Education.au
> except where the sender expressly states otherwise. It is your
> responsibility to scan this email and any files transmitted with it
> for viruses or any other defects. education.au limited will not be
> liable for any loss, damage or consequence caused directly or
> indirectly by this email.

--
Jonathan Koren
jonat...@soe.ucsc.edu
http://www.soe.ucsc.edu/~jonathan/



IMPORTANT: This e-mail, including any attachments, may contain private or 
confidential information. If you think you may not be the intended recipient, 
or if you have received this e-mail in error, please contact the sender 
immediately and delete all copies of this e-mail. If you are not the intended 
recipient, you must not reproduce any part of this e-mail or disclose its 
contents to any other party. This email represents the views of the individual 
sender, which do not necessarily reflect those of Education.au except where the 
sender expressly states otherwise. It is your responsibility to scan this email 
and any files transmitted with it for viruses or any other defects. 
education.au limited will not be liable for any loss, damage or consequence 
caused directly or indirectly by this email.

Reply via email to