Hi

I think I would avoid adding more of these at the moment, especially
ones that aren't very specific (why is "package" Go and not Java?) and
for languages that haven't been around very long, unless it is solving a
specific problem.

Original file has moved these into the magic files and made them more
sophisticated (Magdir/c-lang), but I doubt our regex code is fast enough
to get away with this. It is mostly stuff like ^ and leading spaces or
#s though - perhaps we could make the C searching code better though, I
just copied what our old file version did. Not sure it is worth it.


On Tue, Jan 15, 2019 at 02:00:11AM -0500, Ted Unangst wrote:
> Matteo Niccoli wrote:
> > Didn't find any other examples. At the moment rust code is recognized
> > as ASCII C program text.
> 
> src/usr.bin/file/text.c has an array of special matches for text.
> 
> It has various omissions, though.
> 
> <!doctype html> is matched as SGML.
> import means Java, but not python or go.
> 
> etc. I suppose it doesn't hurt to add a few more entries, but every entry
> slows down file. So we shouldn't go too wild.
> 
> Anyway, this adds support for go by matching "package". It also removes two
> entries that result in false positives if they match too soon.
> 
> Index: text.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/file/text.c,v
> retrieving revision 1.3
> diff -u -p -r1.3 text.c
> --- text.c    18 Apr 2017 14:16:48 -0000      1.3
> +++ text.c    15 Jan 2019 06:58:36 -0000
> @@ -31,14 +31,13 @@ static const char *text_words[][3] = {
>       { "import", "Java program", "text/x-java" },
>       { "\"libhdr\"", "BCPL program", "text/x-bcpl" },
>       { "\"LIBHDR\"", "BCPL program", "text/x-bcpl" },
> -     { "//", "C++ program", "text/x-c++" },
>       { "virtual", "C++ program", "text/x-c++" },
>       { "class", "C++ program", "text/x-c++" },
>       { "public:", "C++ program", "text/x-c++" },
>       { "private:", "C++ program", "text/x-c++" },
> -     { "/*", "C program", "text/x-c" },
>       { "#include", "C program", "text/x-c" },
>       { "char", "C program", "text/x-c" },
> +     { "package", "Go program", "text/x-go" },
>       { "The", "English", "text/plain" },
>       { "the", "English", "text/plain" },
>       { "double", "C program", "text/x-c" },
> 

Reply via email to