Guys, this isn't THAT stupid of a question is it? From my perspective,
the way PHP seems to see it is that I should already know what kind of
file I'm looking at. In most cases that's not an unreasonable
assumption. Unfortunately, that's only good for most cases. PHP is rich
in ways to work
Couldn't you just check the extension on the file?
On Mon, 2004-02-23 at 14:03, Axel IS Main wrote:
Guys, this isn't THAT stupid of a question is it? From my perspective,
the way PHP seems to see it is that I should already know what kind of
file I'm looking at. In most cases that's not an
Yes, and in fact that is what I am doing now. This is a spider bot
though, so I'm having to think of every single type of binary file that
could be linked to on the web. So far I'm up to 28 with no end in sight.
What about a .com file? I can't omit links that end in .com can I? That
would be
Well you can do a check on the mime type of the file. eg.
$mimes = array(1 = application/octet-stream,
2: = image/jpeg,
etc.
For more info...
http://us4.php.net/manual/en/ref.filesystem.php
Just like the upload file function you can check for the mime types...
Hello Axel,
Monday, February 23, 2004, 7:03:38 PM, you wrote:
AIM Guys, this isn't THAT stupid of a question is it? From my perspective,
AIM the way PHP seems to see it is that I should already know what kind of
AIM file I'm looking at. In most cases that's not an unreasonable
AIM assumption.
Well actually to check .com, just make sure it contains a / then the
.com, that will filter yahoo.com, but keep yahoo.com/downloadme.com
On Mon, 2004-02-23 at 14:19, Axel IS Main wrote:
Yes, and in fact that is what I am doing now. This is a spider bot
though, so I'm having to think of every
On Mon, 2004-02-23 at 14:19, Axel IS Main wrote:
Yes, and in fact that is what I am doing now. This is a spider bot
though, so I'm having to think of every single type of binary file that
could be linked to on the web. So far I'm up to 28 with no end in sight.
What about a .com file? I
Generally, binaries have \0 in them, but it is not necessery.
Axel IS Main wrote:
Guys, this isn't THAT stupid of a question is it? From my perspective,
the way PHP seems to see it is that I should already know what kind of
file I'm looking at. In most cases that's not an unreasonable
Hello Axel,
Monday, February 23, 2004, 7:38:25 PM, you wrote:
AIM Thanks, you just gave me the solution, I think. I don't have to strip
AIM out every character above standard ascii, I just have to look for them.
AIM If one is there, then just get rid of it. It's true that an OS can't
AIM tell
Thanks, that's very helpful. It beats the heck out of doing it the way
I've been doing it.
Richard Davey wrote:
Hello Axel,
Monday, February 23, 2004, 7:38:25 PM, you wrote:
AIM Thanks, you just gave me the solution, I think. I don't have to strip
AIM out every character above standard ascii,
On Monday 23 February 2004 11:55 am, Richard Davey wrote:
Hello Axel,
Monday, February 23, 2004, 7:38:25 PM, you wrote:
AIM Thanks, you just gave me the solution, I think. I don't have to strip
AIM out every character above standard ascii, I just have to look for
them. AIM If one is there,
Hello Evan,
Monday, February 23, 2004, 8:57:43 PM, you wrote:
It would be wise to check for characters from 0 to 31, if they appear
then it's almost certainly (but not guaranteed) binary.
EN Assuming that's decimal, you're including 0x09 0x0a and 0x0d which are,
EN respectively, tab, line
That's not bad, but I found a way to do it simply using chr() and
passing it a value. It turns out the if I go 0-31 Almost nothing will
get through. Even the simples html has something in there from that
list. However, by just looking between 14 and 26, one more than carriage
return, and one
On Monday 23 February 2004 03:02 pm, Axel IS Main wrote:
That's not bad, but I found a way to do it simply using chr() and
passing it a value. It turns out the if I go 0-31 Almost nothing will
get through. Even the simples html has something in there from that
list. However, by just looking
Alternatively, count unigrams in the first 1000 characters and get the
euclidean distance to a sample from e.g. an english text, a french
text, a chinese text, etc.
- Lucas
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Richard Davey wrote:
Hello Axel,
Monday, February 23, 2004, 7:03:38 PM, you wrote:
AIM Guys, this isn't THAT stupid of a question is it? From my perspective,
AIM the way PHP seems to see it is that I should already know what kind of
AIM file I'm looking at. In most cases that's not an
I'm using file_get_contents() to open URLs. Does anyone know if there is
a way to look at the result and determine if the file is binary? I'd
like to be able to block binaries from being processed without having to
try to think of all the possible binary extensions and omit them with a
17 matches
Mail list logo