RE: One for next week

James Chapman-Smith Thu, 25 Nov 2010 23:19:13 -0800

Hi Ian,


I just did a test of the speed of removing the invalid chars using brute
force. Here's my code:

 

 

var invalids = System.IO.Path.GetInvalidPathChars()

     .Union(System.IO.Path.GetInvalidFileNameChars());

 

var text = new string('x', 200000);

 

var query = from c in text

                where !invalids.Contains(c)

                select c;

 

var clean = new string(query.ToArray());

 

 

My computer manages to strip the chars from a 139,000 character string in
about a second - timed using System.Diagnostics.Stopwatch.  So for many
circumstances I think that a brute force approach is quite workable. What do
you think?

 

Cheers.

 

James.

 

From: [email protected] [mailto:[email protected]]
On Behalf Of Ian Thomas
Sent: Friday, 26 November 2010 17:22
To: 'ozDotNet'
Subject: One for next week

 

My regex is very irregular, so some ideas would be nice 

Problem: excluding the prohibited characters from file paths and file names.
I started off thinking that \ / : * ? " < > | would be about the maximum,
and I would just pass the filenames (generated from text titles - eg, books,
videos, etc) though a simple looping routine looking for the 9 prohibited
characters. 

Using a simple regex, Regex.Replace(strIn, "[^\...@-]", "") is too
restrictive - for example, bracketed numbers (1), [23], etc are very common.
I've devoted too long to expanding this without much joy, and would
appreciate help.  

In my researches, I discovered these two helpful methods in System.IO -
which is why my first approach, comparing characters and arrays, was
abandoned to explore if regular expressions might help. 

Path.GetInvalidPathChars() - Get a list of invalid path characters (returns
an array of Char)

and

Path.GetInvalid FileNameChars() - Get a list of invalid filename characters
(returns an array of Char)

The number returned is surprisingly large, so iterating through even a
50-character long filename / path name and checking for the undesirable
characters would be considerably longer than doing the same for 9
characters. 

  _____  

Ian Thomas
Victoria Park, Western Australia

RE: One for next week

Reply via email to