RE: New radio PIDs, more than 8 characters - "solved"

2017-08-23 Thread Hugh Reynolds
Vangelis ,

Unlocker only reported "Unlock and Delete Failed" for each of the three
files.  But it didn't fail because the files were deleted.  I have no other
diagnostic data. 

One again many thanks.

Regards
Hugh

-Original Message-
From: get_iplayer [mailto:get_iplayer-boun...@lists.infradead.org] On Behalf
Of Vangelis forthnet
Sent: 24 August 2017 02:17
To: get_iplayer@lists.infradead.org
Subject: Re: New radio PIDs, more than 8 characters - "solved"

 On Wed Aug 23 16:44:30 BST 2017, Hugh Reynolds wrote: 

> Reboot didn't help
> Uninstall and Reboot didn't help
> iobit-unlocker helped.
> 
> Clean install is now working. 
> 
> Many, many thanks. 

 You're welcome! I'm glad you're back up and running :-) For the sake of
completeness/closure, were you even able to determine (via Unlocker) the
process(es) locking those .exe files even after a system reboot? 
Doesn't make much sense to me... 
Perhaps an overzealous antimalware suite?

 Using Unlocker is just a workaround;
if you do not discover the root cause
of your issue and rectify it, then there's a chance you'll revisit the issue
in the next GiP update (?) - are you able now to, e.g., send ffmpeg.exe
(temporarily) to the Recycle Bin? Just my 2p, I am not jinxing your future
GiP upgrades! 

Best wishes,
Vangelis.

___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


Re: New radio PIDs, more than 8 characters - "solved"

2017-08-23 Thread Vangelis forthnet
On Wed Aug 23 16:44:30 BST 2017, Hugh Reynolds wrote: 


Reboot didn't help
Uninstall and Reboot didn't help
iobit-unlocker helped.

Clean install is now working. 

Many, many thanks. 


You're welcome! I'm glad you're back up and running :-) 
For the sake of completeness/closure, 
were you even able to determine (via Unlocker) 
the process(es) locking those .exe files 
even after a system reboot? 
Doesn't make much sense to me... 
Perhaps an overzealous antimalware suite?


Using Unlocker is just a workaround; 
if you do not discover the root cause 
of your issue and rectify it, then there's 
a chance you'll revisit the issue in the next 
GiP update (?) - are you able now to, e.g., 
send ffmpeg.exe (temporarily) to the 
Recycle Bin? Just my 2p, I am not 
jinxing your future GiP upgrades! 

Best wishes, 
Vangelis.


___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


RE: New radio PIDs, more than 8 characters - "solved"

2017-08-23 Thread Hugh Reynolds
Vangelis, All,

Sorry for the Hijack.
Reboot didn't help
Uninstall and Reboot didn't help
iobit-unlocker helped.

Clean install is now working.

Many, many thanks. 

Hugh


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


Re: New radio PIDs, more than 8 characters - "solved"

2017-08-17 Thread Ralph Corderoy
Hi C. E.,

> for a high-level language, [Perl's] syntax is unnecessarily difficult
> and obscure.

Perl's syntax is heavy on notation, but then notation is powerful
compared to the long-hand alternatives, and that's why it's fine in
maths, chemistry, and Perl.  For the occasional visitor, Python's that
way →

Perl uses sigils, a symbol attached to a variable name, but then so does
sh(1) that influenced it.  Many languages do, Ruby, PHP, ..., though not
all the time, e.g. Python has `@foo' to mean it's a decorator function.
Perl also sigils to indicate the type of an identifier, so `$foo' is a
simple scalar variable whereas `@bar' is a indexable list.  Similar
syntax differences allow literals to be given:  `[42, 314, "xyzzy"]' is
a list whereas '{May => 10, Hammond => 11}' is a `hash', AKA associative
array or dictionary.

Perl is no harder to learn than C or Ruby.  They both like notation too,
e.g. the «int foo(int, int (*)(void *, char *), void *);» I wrote
recently.  Perl's easier than PHP because that has far too much
duplication, bad design, and corner cases to memorise.  C++ is also
something to avoid;  too large a language and each coder uses a distinct
subset.  Assembly languages are easy, once you understand a CPU's
workings, but RISC ones like ARM are nice to learn compared to the
twisty passages of x86.

> The whole point of high-level languages, the reason they were
> invented, was make to programming more human-readable and therefore
> more understandable, but Perl bucks that trend.

I think it was to give more expressive power than assembly language by
introduction abstraction, and notation, at the cost of efficiency.
Plenty of Unix programmers with a sh, sed, awk, background found Perl
straightforward to pick up because it distilled their features into a
single language.  It was Perl 4 when I learnt it, and a single
well-written man page described the language and that's all the
documentation there was.

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy

___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


Re: New radio PIDs, more than 8 characters - "solved"

2017-08-16 Thread SAO
Would you two please take this childish spat to private email and save the  
rest of us from terminal boredom.


On Wed, 16 Aug 2017 17:20:10 +0100, David Cantrell   
wrote:



On Wed, Aug 16, 2017 at 04:57:30PM +0100, C E Macfarlane wrote:

But, since you are obviously spoiling for a fight, why should anyone  
listen

to someone who has confessed to being a part of putting all that massive
bloat in BBC web pages


[citation needed]


presumably therefore you will feel at home
bloating my spam folder henceforth.  Bye, bye.


Awww, poor baby who can't bear to hear that he's wrong.




--
.

___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


Re: New radio PIDs, more than 8 characters - "solved"

2017-08-16 Thread David Cantrell
On Wed, Aug 16, 2017 at 04:57:30PM +0100, C E Macfarlane wrote:

> But, since you are obviously spoiling for a fight, why should anyone listen
> to someone who has confessed to being a part of putting all that massive
> bloat in BBC web pages

[citation needed]

> presumably therefore you will feel at home
> bloating my spam folder henceforth.  Bye, bye.

Awww, poor baby who can't bear to hear that he's wrong.

-- 
David Cantrell | Bourgeois reactionary pig

I think the most difficult moment that anyone could face is seeing
their domestic servants, whether maid or drivers, run away
  -- Abdul Rahman Al-Sheikh, writing on 25 Jan 2004 at
 http://www.arabnews.com/node/243486

___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


RE: New radio PIDs, more than 8 characters - "solved"

2017-08-16 Thread C E Macfarlane
> On Wed, Aug 16, 2017 at 04:16:33PM +0100, C E Macfarlane wrote:
>
> > as for arcane-ness of language and difficulty in reading
> > it's about on a par with The Bible!
>
> That's what everyone thinks about languages that they are too damned
> lazy to learn.

Laziness doesn't enter into it, the list of languages I've already learnt
includes others, such as Assembler (for three different processors), that
were much harder to learn than Perl would ever be, but the fact remains
that, for a high-level language, its syntax is unnecessarily difficult and
obscure.  The whole point of high-level languages, the reason they were
invented, was make to programming more human-readable and therefore more
understandable, but Perl bucks that trend.

But, since you are obviously spoiling for a fight, why should anyone listen
to someone who has confessed to being a part of putting all that massive
bloat in BBC web pages  -  presumably therefore you will feel at home
bloating my spam folder henceforth.  Bye, bye.


___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


Re: New radio PIDs, more than 8 characters - "solved"

2017-08-16 Thread David Cantrell
On Wed, Aug 16, 2017 at 04:16:33PM +0100, C E Macfarlane wrote:

> as for arcane-ness of language and difficulty in reading
> it's about on a par with The Bible!

That's what everyone thinks about languages that they are too damned
lazy to learn.

-- 
David Cantrell | Nth greatest programmer in the world

Erudite is when you make a classical allusion to a
feather.  Kinky is when you use the whole chicken.

___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


RE: New radio PIDs, more than 8 characters - "solved"

2017-08-16 Thread C E Macfarlane
See reply below ...
--
www.macfh.co.uk/MacFH.html

> The first two google results for "perl regular expression" not good
> enough for you :-)

I'm sure they would have been, but that wasn't what I searched for.  I
searched for something more precise.

> BTW, it's Perl or perl, not PERL. Perl is the name of the
> language, perl
> is the name of the interpreter.

I'd always understood it to be an acronym, but looking it up in response
your post, acronyms have been applied to it, but after the event, and it's
true derivation seems to have been from a Biblical quote  -  quite
appropriate really, as for arcane-ness of language and difficulty in reading
it's about on a par with The Bible!


___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


Re: New radio PIDs, more than 8 characters - "solved"

2017-08-16 Thread David Cantrell
On Tue, Aug 15, 2017 at 03:06:51PM +0100, C E Macfarlane wrote:

> Yes, I was aware of \b support in some languages, but RE support varies
> across languages, and, knowing this but not being experienced in PERL, I
> checked at least two online sources for PERL REs and could find no evidence
> of support for it.

The first two google results for "perl regular expression" not good
enough for you :-)

BTW, it's Perl or perl, not PERL. Perl is the name of the language, perl
is the name of the interpreter.

-- 
David Cantrell | Hero of the Information Age

If you have received this email in error, please add some nutmeg
and egg whites, whisk, and place in a warm oven for 40 minutes.

___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


RE: New radio PIDs, more than 8 characters - "solved"

2017-08-16 Thread C E Macfarlane
More on REs ...
--
www.macfh.co.uk/MacFH.html

> > would we both agree with?:
> > \b[bpw][0-9][a-z0-9]{7,13}\b
>
> I think it's
>
> \b[bpw]\d[b-df-hj-np-tv-z\d]{6,13}\b
>
> to cover the existing ones that are eight long, up to the 15-long
> w172vg029mkl852 that Vangelis mentioned.  And we may as well
> borrow from
> the specification and cut out the vowels rather than allow a-z.

Yes, I agree.

I see that my 7 at the end was an actual error, but for the rest of it
you're saving some characters in capturing the second character, and you're
being more precise in the tail.  I think your suggestion should work well,
and am happy to agree with it.

> I'd probably put all of it other than the two `\b' into a
> variable with
> qr//, and then embed that in regexps as needed, adding `()', or `\b',
> etc., back.

... where and as needed.  Yes, a sensible approach.

> Out of interest, I've looked at 3.01's get_iplayer for "]0" to see how
> it already uses it.
>
>  941 if ( $this->{pid} !~ m{^([pb]0[a-z0-9]{6})$} ) {
>
> $1 doesn't seem to be used afterwards, so the `()' aren't needed.
>
> 3359 if ( $prog->{pid} =~
> m{^http.+\/([pb]0[a-z0-9]{6})\/?.*$} ) {
>
> The `/' are unnecessarily backslashed given that m{} is used
> so the `/'
> doesn't have special meaning.  The `.+' means the last thing to match
> the PID RE is used.  The `/?' makes the terminating slash
> optional, but
> this means "http://.../p0abc123def; matches, but $1 ignores the "def".
> The `.*$' isn't wanted as it's always true.
>
> 4409 $pid = $1 if $prog->{pid} =~
> /\/([bp]0[a-z0-9]{6})/
>
> This time the first PID-like thing would be used.  Again, a
> "def" would
> be ignored.
>
> 4416 if ( $pid !~ /^[bp]0[a-z0-9]{6}$/ ) {
> 4521 if ( $pid !~ /^[bp]0[a-z0-9]{6}$/ && $pid !~
> /^http/ ) {
> 4531 if ( $pid =~ /^[bp]0[a-z0-9]{6}$/ ) {
> 4603 if ( $pid =~ /^[bp]0[a-z0-9]{6}$/ ) {
> 4686 } elsif ( $prog->{pid} =~ /^[bp]0[a-z0-9]{6}$/ ) {
>
> All the same.  Fine.
>
> 5095 if ( $prog->{pid} !~ m{^([pb]0[a-z0-9]{6})$} ) {
>
> "pb" rather than "bp", just for spice.  No need to capture.
>
> 5253 return $1 if $_[0] =~ m{/?([wpb]0[a-z0-9]{6})};
>
> This one has a `w'!

I would cry out "Gawdon Bennet!", but he wouldn't hear me from shaking his
head in disbelief.  Even after Martin Clark's post giving a tally of them
all, the full horror of it doesn't really sink in until you see them all
listed together as you have done.  It really is a classic example of the
need to declare a multiply-used value up front at the top of the programme
as a constant or variable, and why this need is the very first item in my
programming check-list!


___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


Re: New radio PIDs, more than 8 characters - "solved"

2017-08-16 Thread RS
From: Jim web 
Sent: Wednesday, August 16, 2017 09:58



I've not encountered any problem thus far, so currently am asking just for
clarification.


Have a look at World Service programmes from Friday onwards.
http://www.bbc.co.uk/programmes/p002w6r2/episodes/downloads
http://www.bbc.co.uk/programmes/p016tl04/broadcasts/2017/08


3) If I still need a validation regex string, what should it be and how
would I make the change?


See
http://lists.infradead.org/pipermail/get_iplayer/2017-August/011020.html
or use the podcast if there is one.


___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


Re: New radio PIDs, more than 8 characters - "solved"

2017-08-16 Thread Ralph Corderoy
Hi C. E.,

> So, yielding to your superior knowledge of PERL, for the sake of
> clarity for the benefit of those who may have had difficulty in
> following the nuances of the argument, or been confused by the
> multiple suggestions, would we both agree with?:
>   \b[bpw][0-9][a-z0-9]{7,13}\b

I think it's

\b[bpw]\d[b-df-hj-np-tv-z\d]{6,13}\b

to cover the existing ones that are eight long, up to the 15-long
w172vg029mkl852 that Vangelis mentioned.  And we may as well borrow from
the specification and cut out the vowels rather than allow a-z.

I'd probably put all of it other than the two `\b' into a variable with
qr//, and then embed that in regexps as needed, adding `()', or `\b',
etc., back.

Out of interest, I've looked at 3.01's get_iplayer for "]0" to see how
it already uses it.

 941 if ( $this->{pid} !~ m{^([pb]0[a-z0-9]{6})$} ) {

$1 doesn't seem to be used afterwards, so the `()' aren't needed.

3359 if ( $prog->{pid} =~ m{^http.+\/([pb]0[a-z0-9]{6})\/?.*$} ) {

The `/' are unnecessarily backslashed given that m{} is used so the `/'
doesn't have special meaning.  The `.+' means the last thing to match
the PID RE is used.  The `/?' makes the terminating slash optional, but
this means "http://.../p0abc123def; matches, but $1 ignores the "def".
The `.*$' isn't wanted as it's always true.

4409 $pid = $1 if $prog->{pid} =~ /\/([bp]0[a-z0-9]{6})/

This time the first PID-like thing would be used.  Again, a "def" would
be ignored.

4416 if ( $pid !~ /^[bp]0[a-z0-9]{6}$/ ) {
4521 if ( $pid !~ /^[bp]0[a-z0-9]{6}$/ && $pid !~ /^http/ ) {
4531 if ( $pid =~ /^[bp]0[a-z0-9]{6}$/ ) {
4603 if ( $pid =~ /^[bp]0[a-z0-9]{6}$/ ) {
4686 } elsif ( $prog->{pid} =~ /^[bp]0[a-z0-9]{6}$/ ) {

All the same.  Fine.

5095 if ( $prog->{pid} !~ m{^([pb]0[a-z0-9]{6})$} ) {

"pb" rather than "bp", just for spice.  No need to capture.

5253 return $1 if $_[0] =~ m{/?([wpb]0[a-z0-9]{6})};

This one has a `w'!

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy

___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


Re: New radio PIDs, more than 8 characters - "solved"

2017-08-16 Thread Jim web
The discussion prompts some questions on my part:

1) I always used -pid  to specify a programme rather than other
methods. Does this *have* to have a validation check by regex? I'd assume
it doesn't need to parse an entire url because it could just tack the value
I give onto the standard parts. Then tell me it can't find something if I
mistyped a pid.

2) If it doesn't need to check, is there a way to tell gip not to do so?
Thus dodging this problem entirely?

3) If I still need a validation regex string, what should it be and how
would I make the change?

I've not encountered any problem thus far, so currently am asking just for
clarification.

Jim

-- 
Electronics  https://www.st-andrews.ac.uk/~www_pa/Scots_Guide/intro/electron.htm
Armstrong Audio  http://www.audiomisc.co.uk/Armstrong/armstrong.html
Audio Misc  http://www.audiomisc.co.uk/index.html


___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


Re: New radio PIDs, more than 8 characters - "solved"

2017-08-15 Thread Vangelis forthnet

On Mon Aug 14 13:19:14 BST 2017, M Clark wrote:


changing all 7 occurrences (sigh...) of
[bp]0[a-z0-9]{6}
to
(?:[bp]0[a-z0-9]{6}|w[a-z0-9]{7,14})
solves the w3*, w1* problem for Me.


Hi Martin; that new code still assumes that both Red Bee & PIPs
PIDs will have "0" as the second character in the string.
I am not saying this is something that will have to be dealt with soon,
but I've watched Red Bee PIDs move from "b08*" to "b09*"
and, recently, from "b0909***" to "b0910***", e.g. "b0910w0x".

If this is a pattern, then I expect strings like "b0999***" to appear in 
the future;

and the next logical (?) step would be strings beginning with "b1**"
(in which case the amended code will break..).
Pure speculation on my part, though...

I haven't done the maths myself (number of permutations of 7 alphanumeric 
strings),

this is supposed to be a huge integer; but, as PIDs are unique
(can't be recycled), linked to a specific audio-visual offering from the 
beeb,

that huge number is bound to be exhausted sometime...

Regards,
Vangelis. 



___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


RE: New radio PIDs, more than 8 characters - "solved"

2017-08-15 Thread C E Macfarlane
More on REs ...
--
www.macfh.co.uk/MacFH.html

> > Yes, I was aware of \b support in some languages, but RE support
> > varies across languages, and, knowing this but not being experienced
> > in PERL, I checked at least two online sources for PERL REs
> and could
> > find no evidence of support for it.
>
> One is http://perldoc.perl.org/perlre.html#Assertions

Obviously my brief research was too brief!

So, yielding to your superior knowledge of PERL, for the sake of clarity for
the benefit of those who may have had difficulty in following the nuances of
the argument, or been confused by the multiple suggestions, would we both
agree with?:
\b[bpw][0-9][a-z0-9]{7,13}\b

> It was Perl that invented `\b', along with many of the other
> conventions
> that spread to other implementations, e.g. `\d' for digit, a
> `?' suffix
> for non-greedy as in /<.*?>/, the otherwise invalid `?' after an open
> parenthesis as a gateway for further flags like the
> non-capturing `:' in
> /(.)(?:.)(.)/, etc.  Larry Wall was very knowledgable of the Unix
> programming environment, including the various regular expression
> syntaxes in sed, grep, egrep, ..., and came up with a consistent
> almost-superset that had some nice conveniences too.

As it happens I've been doing some Bash scripting over the last week or so.

> > True, but if that is starting to happen, then one of the
> other 'rules'
> > was to break a monolithic program into blocks
>
> Alas, AFAIK, get_iplayer wishes to ship as a single file.

You can still break a single file down into blocks, both by using
subroutines/functions or even just by appropriate layout and commenting, and
in both cases individually testing that the resulting sections do what is
expected of them.


___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


RE: New radio PIDs, more than 8 characters - "solved"

2017-08-15 Thread C E Macfarlane
More about Regular Expressions (REs), which can be safely ignored by those
not interested ...
--
www.macfh.co.uk/MacFH.html

> > it might be necessary to bracket it at the beginning and end with
> > non-capturing non-word meta- or pseudo-characters
>
> Rather than \W, representing a single non-word character, \b would be
> better, meaning a zero-width boundary between a word, \w, and
> non-word,
> \W, character, or the start or end of the string.  /\Wfoo\W/ matches
> ":foo:", but not "foo", but /\bfoo\b/ matches both.  The zero-width
> means it consumes nothing;  the test is of the character either side.

Yes, I was aware of \b support in some languages, but RE support varies
across languages, and, knowing this but not being experienced in PERL, I
checked at least two online sources for PERL REs and could find no evidence
of support for it.  If you're a PERL programmer and therefore know that it
works, I concede to your superior knowledge.

RE support has varied from none to very complete with every language I've
programmed, which includes Assembler, Bash, BASIC, C, COBOL, SQL, and
Python, but these days I'm more used to HTML and JavaScript, and before that
I was doing quite a lot of work in Java, and, despite their similar sounding
names and similarities in basic syntax, the latter two are very different in
many, perhaps most, other respects, including REs.  So REs are regularly one
of those areas I find myself having to refer back to manuals and API
documentation in particular cases, and at one point in one particular case
got so frustrated with the complexity of the REs I was trying to develop
that I spent some time creating a JS RE test page to help develop the code.
Ironically, the JS RE Test Page has blossomed into being quite a successful
page on my site, but the original page that caused it to be written is still
'under development'!-)

> > Pretty much the first item in that list was to declare constants at
> > the beginning of the program containing all the fixed or semi-fixed
> > values that the program needed
>
> Though that can put them a long way from their use, removing context
> from their definition and requiring it to be put back into their
> identifier instead.

True, but if that is starting to happen, then one of the other 'rules' was
to break a monolithic program into blocks  -  each of which has one
particular purpose, which testing has shown it to do well and without
rror  -  and then build the program up by combining such blocks; because the
individual components are known to work, the wider program built from them
is likely to be more reliable.

Regards.


___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


Re: New radio PIDs, more than 8 characters

2017-08-15 Thread James Scholes

C E Macfarlane wrote:

Thinking about this a bit more, I wouldn't wish to claim a spurious hit was
more likely with no upper limit, but nevertheless I would still regard it
better programming practice to have one  -  with normal written English, the
potential for spurious hits would be low, and in the event of one it would
be delimited quickly by the next space, but if you were trawling raw HTML or
similar code, which might contain longs strings of pseudo-random characters
as not just PIDs, but also GUIDs, session keys, and the like, then the
potential for spurious hit would be very much increased, so more would be
found, and in the interests of program efficiency you'd want them to be
delimited sooner rather than later.


This is reasonable.  The regexp without an upper limit sourced from the 
BBC's code is used to confirm that a given string is formed only of 
characters from an acceptable set to make up a PID.  In most cases the 
string which is passed in is explicitly extracted from the request URL, 
as the application in question is a server-side, web-based one.  For 
such purposes I think the lack of an upper limit is completely 
acceptable, but if you're writing code to extract a valid PID from text 
of unknown length or complexity, the regexp probably is not very efficient.

--
James Scholes
http://twitter.com/JamesScholes

___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


Re: New radio PIDs, more than 8 characters - "solved"

2017-08-15 Thread Ralph Corderoy
Hi C. E.,

> it might be necessary to bracket it at the beginning and end with
> non-capturing non-word meta- or pseudo-characters

Rather than \W, representing a single non-word character, \b would be
better, meaning a zero-width boundary between a word, \w, and non-word,
\W, character, or the start or end of the string.  /\Wfoo\W/ matches
":foo:", but not "foo", but /\bfoo\b/ matches both.  The zero-width
means it consumes nothing;  the test is of the character either side.

> pseudo code
> if --url
>   strip characters following last /
>   use as pid
...
> ... particularly as URLs exist with other characters after the PID,
> though perhaps these might not be used in the context of GiP.

Yes, it's never that simple.  :-)  URLs have a defined structure and
encoding rules, and there's query parameters and fragments to consider.

> Pretty much the first item in that list was to declare constants at
> the beginning of the program containing all the fixed or semi-fixed
> values that the program needed

Though that can put them a long way from their use, removing context
from their definition and requiring it to be put back into their
identifier instead.

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy

___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


RE: New radio PIDs, more than 8 characters

2017-08-14 Thread C E Macfarlane
Correction below:
--
www.macfh.co.uk/MacFH.html

> Yes, but in practice you'd want an upper limit, because the
> higher the limit
> the more likely you are to get spurious hits.

Thinking about this a bit more, I wouldn't wish to claim a spurious hit was
more likely with no upper limit, but nevertheless I would still regard it
better programming practice to have one  -  with normal written English, the
potential for spurious hits would be low, and in the event of one it would
be delimited quickly by the next space, but if you were trawling raw HTML or
similar code, which might contain longs strings of pseudo-random characters
as not just PIDs, but also GUIDs, session keys, and the like, then the
potential for spurious hit would be very much increased, so more would be
found, and in the interests of program efficiency you'd want them to be
delimited sooner rather than later.

Regards.


___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


RE: New radio PIDs, more than 8 characters

2017-08-14 Thread C E Macfarlane
See interleaved reply below ...
--
www.macfh.co.uk/MacFH.html

> >...
> >[bpw][0-9][a-z0-9]{7,13}
> >... would probably do the job.
> >

> The BBC version James Scholes gave seems to be much wider.
> Does {8,} mean
> at least 8 with no upper limit?

Yes, but in practice you'd want an upper limit, because the higher the limit
the more likely you are to get spurious hits.


___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


Re: New radio PIDs, more than 8 characters

2017-08-14 Thread RS

From: C E Macfarlane
Sent: Sunday, August 13, 2017 20:54




...
[bpw][0-9][a-z0-9]{7,13}
... would probably do the job.

The BBC version James Scholes gave seems to be much wider.  Does {8,} mean 
at least 8 with no upper limit?





___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


RE: New radio PIDs, more than 8 characters - "solved"

2017-08-14 Thread C E Macfarlane
More about regular expressions and programming follows, which those with no or 
little interest in either can safely ignore ...
-- 
www.macfh.co.uk/MacFH.html

> > I think what Charles was meaning is that if you were using 
> --url "http://www.bbc.co.uk/programmes/b08xy0gl; rather than 
> a direct PID then the code is looking for something starting 
> with either b, p or w followed by between 7 and 14 letters or 
> numbers and the first thing it hits that matches all that 
> criteria is the word "programmes". Like you say, GiP wouldn't 
> return any VPID info but as it finds programmes to be a valid 
> PID, it won't keep looking for the proper PID in that URL so 
> would never be able to download from a URL.

Yes, it depends very much on the intended use for the regular expresion (RE).

The most general situation is trawling any text, such as HTML, WITHOUT REGARD 
TO CONTEXT for capturing something resembling a PID.  In this situation, 
probably even the correction I suggested may not be adequate, it might be 
necessary to bracket it at the beginning and end with non-capturing non-word 
meta- or pseudo-characters, the representation of which can sometimes differ 
from language to language but is usually \W, as it is in PERL, so ...
\W([bpw][0-9][a-z0-9]{7,13})\W
... should capture PIDs reasonably accurately without regard to context, though 
I wouldn't rely even on this without a deal of testing with many actual 
examples of text to be trawled.

However, if you already know something about the context, then of course that 
makes things easier.  The correction I suggested should pick PIDs out of URLs 
more elegantly and simply, in a single statement in fact, than either the 
original suggestion or programming to implement the following pseudo-code ...

> ?
> pseudo code
> if --url
>   strip characters following last /
>   use as pid
>   validate_pid
> end-if
> ?

... particularly as URLs exist with other characters after the PID, though 
perhaps these might not be used in the context of GiP.
 
> Anyway...
> changing all 7 occurrences

:-(

>   (sigh...)

I think in my case that would more likely have been '(expletive deleted)'!

>   of
> [bp]0[a-z0-9]{6}
> to
> (?:[bp]0[a-z0-9]{6}|w[a-z0-9]{7,14})
> solves the w3*, w1* problem for Me.

> Also. No disrespect intended to Dinkypumpkin as "he's" only picked-up
> existing code but, as an ex-programmer I'm horrified by the code
> repetition.  Doesn't Perl allow 'functions'?  i.e. if valid_pid ...
> where valid_pid contains said validation.

Yes, grateful though I am, probably along with all of us here, for GiP's 
wonderfully useful functionality, when I first looked at its code, I rejected 
any idea of contributing much actual programming suggestions, because I'd feel 
I had to completely rewrite the program rather than just tinker with it!

I can't remember where now, whether it was from a book, or a 6th form college 
or university course, but somewhere somehow I acquired a mental list of very 
basic things to get right when programming ...

Pretty much the first item in that list was to declare constants at the 
beginning of the program containing all the fixed or semi-fixed values that the 
program needed, so that if one of them changed, you only had to change the one 
easily-found line at the beginning where the value was declared, not the 
possibly tens, hundreds, even thousands of lines throughout the rest of the 
program where that value was used.  A template for BBC URLs and an RE to 
capture PIDs would both obviously be prime examples of this.

As you suggest, another, probably second or third on the list, was to put oft 
repeated code in subroutines/functions.

When I got out into the 'real' world, I was appalled to find that code that 
disregarded most or all of the principles outlined in my mental list was 
actually widespread, perhaps even in the majority!  I sometimes think it's a 
near miracle that some programmes ever run correctly at all!

Regards.


___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


Re: New radio PIDs, more than 8 characters - "solved"

2017-08-14 Thread Ralph Corderoy
Hi M,

> I'm horrified by the code repetition.  Doesn't Perl allow 'functions'?

Yes, that's those

sub foo {
...
}

you see.

It can also hold a regexp in a variable so a `$pid_regexp' could be
defined once and used repeatedly.

$ perl -e '
> $re = qr/^(food|drink|famine)\d*$/;
> while (<>) {
> /$re/ and print "$. $_";
> }
> '
abc
food
2 food
drink42
3 drink42
xyz
$

BTW, given your private email, you might be interested to know the
Regular Expressions, of which regexps are an extension, are essentially
a "little language" for describing a regular grammar, level 3 in
Chomsky's hierarchy.  These are grammars that can be matched with a
finite-state automaton, and implementations are either
non-deterministic, like Perl's, or deterministic, like Go's.  As such,
they're a succinct way of expressing many text matching problems, just
as BNF is a convenient method for programming language grammars.  It's
interesting to compare the simple one above to the alternative long-hand
imperative programming form.

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy

___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


Re: New radio PIDs, more than 8 characters

2017-08-14 Thread RS

From: Vangelis forthnet Sent: Monday, August 14, 2017 03:45



The Al Gore one is now p05c2b9j
(snip) The changes may be temporary.



... Yes and no... If you browse to:



http://www.bbc.co.uk/programmes/p016tl04/episodes/player


you'll see there exist two distinct entries for "Former US Vice President 
Al Gore on Climate Change": http://www.bbc.co.uk/programmes/w172vg029mkl852
(which looks more like the proper episode page) and 
http://www.bbc.co.uk/programmes/p05c2b9j


Curiously the duplication is only on the Available now tab.  The All and By 
date tabs only have the version with a 15 digit w1 PID and future programmes 
also have a 15 digit w1 PID.


Looking through programmes added to get_iplayer's cache I can see quite a 
few World Service programmes from the past few days with w PIDs.  I haven't 
yet seen any podcasts with PIDs other than p0 (not that I have looked hard) 
and podcast PIDs are usually fairly similar to iPlayer PIDs.





___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


Re: New radio PIDs, more than 8 characters

2017-08-14 Thread tellyaddict
> > But there is a potential problem with [a-z0-9] 
> > because it would pick out many normal English words, 
> > for example, ironically, 'programmes' and 'programming'.
> 
>  Again, forgive me for being obtuse, still learning things, 
> but isn't that scenario assuming one actually inputs 
> 
> --pid="programming" 
> 
> into a GiP command? (and in that case, even if the string 
> is validated as a PID, the playlist.json URL will not 
> return any vpid for that PID, hence no download...).

I think what Charles was meaning is that if you were using --url 
"http://www.bbc.co.uk/programmes/b08xy0gl; rather than a direct PID then the 
code is looking for something starting with either b, p or w followed by 
between 7 and 14 letters or numbers and the first thing it hits that matches 
all that criteria is the word "programmes". Like you say, GiP wouldn't return 
any VPID info but as it finds programmes to be a valid PID, it won't keep 
looking for the proper PID in that URL so would never be able to download from 
a URL.

___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


Re: New radio PIDs, more than 8 characters

2017-08-13 Thread Vangelis forthnet
On Sun Aug 13 18:28:42 BST 2017, RS wrote: 


The Al Gore one is now p05c2b9j
(snip) 
The changes may be temporary.


... Yes and no... If you browse to: 


http://www.bbc.co.uk/programmes/p016tl04/episodes/player

you'll see there exist two distinct entries for 
"Former US Vice President Al Gore on Climate Change": 
http://www.bbc.co.uk/programmes/w172vg029mkl852
(which looks more like the proper episode page) and 
http://www.bbc.co.uk/programmes/p05c2b9j


I haven't listened to them fully to compare them, 
both are listed as being 52min59sec in duration...
The next ("Trump... ") episode of the series is still 
only available as a "w" PID. And on 


http://www.bbc.co.uk/programmes/p029zl67/episodes/player

latest episode is still available as an 8-digit "w" PID :-(

The crux of the matter is that these "w"-PID progs 
are not recordable with the current iteration of GiP, 
not even when the progs are indexed inside the 
radio.cache and one uses " --get" to fetch! 
Haven't seen the figures for UK audiences, but 
I'll let you know WSR is very popular overseas! 

On Sun Aug 13 20:54:11 BST 2017, C E Macfarlane wrote: 


> [bpw][a-z0-9]{14}
(snip)
Firstly it would have to be ... {8,14}, 
otherwise shorter ones would not be picked up.


Rather "{7,14}", since the minimum character length 
of a PID is 8...


But there is a potential problem with [a-z0-9] 
because it would pick out many normal English words, 
for example, ironically, 'programmes' and 'programming'.


Again, forgive me for being obtuse, still learning things, 
but isn't that scenario assuming one actually inputs 

--pid="programming" 

into a GiP command? (and in that case, even if the string 
is validated as a PID, the playlist.json URL will not 
return any vpid for that PID, hence no download...).


Best regards, 
Vangelis.


___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


RE: Fw: Re: New radio PIDs, more than 8 characters

2017-08-13 Thread C E Macfarlane
See reply ...
--
www.macfh.co.uk/MacFH.html

> Would I be right in saying that amending Vangelis' suggestion to:
>  [bpw][a-z0-9]{14}
> would pick up 15 character PIDs as well as anything shorter?

Firstly it would have to be ... {8,14}, otherwise shorter ones would not be
picked up.

But there is a potential problem with [a-z0-9] because it would pick out
many normal English words, for example, ironically, 'programmes' and
'programming'.  I think you'd need the second character to be a digit, so
...
[bpw][0-9][a-z0-9]{7,13}
... would probably do the job.


___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


Re: Fw: Re: New radio PIDs, more than 8 characters

2017-08-13 Thread Alan Milewczyk

On 13/08/2017 17:44, M Clark wrote:

Thanks for coding help but, the problem is worse (for me...),
PID w172vg029mkl852
Business Matters - Former US Vice President Al Gore on Climate Change 
(w172vg029mkl852)

So not even 8 characters.  Other World Service Series (?) have similar PIDs 
now, e.g. Weekend, World Update.

And I thought I was the only one who listened to WS!

M.



No you're not alone, I've been listening to the World Service (and, a 
long time ago, the European Service) for a while and thanks for pointing 
out this issue, I hadn't noticed it. I tend to do my file checking on a 
Monday for the preceding week.


I could follow the logic of  Vangelis' amended coding but I'm afraid I 
don't understand Ralph's. To be honest I'm not that bothered about 
coding that eliminates invalid PID combinations, I'm more interested in 
using something that picks them all up. Would I be right in saying that 
amending Vangelis' suggestion to:

[bpw][a-z0-9]{14}
would pick up 15 character PIDs as well as anything shorter?


Alan

PS maybe in the light of Richard's post just now, we need not worry!

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


Re: New radio PIDs, more than 8 characters

2017-08-13 Thread RS
From: Ralph Corderoy 
Sent: Sunday, August 13, 2017 17:54



Thanks for coding help but, the problem is worse (for me...), PID
w172vg029mkl852 Business Matters - Former US Vice President Al Gore 
Climate Change (w172vg029mkl852)


The Al Gore one is now p05c2b9j
The podcast is p05c2b9b

The podcast for the Trump warns one is p05c5r95

The changes may be temporary.



___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


Re: New radio PIDs, more than 8 characters

2017-08-13 Thread Ralph Corderoy
Hi M,

> > ^(?:[bp]0|w3)[a-z0-9]{6}$
>
> Thanks for coding help but, the problem is worse (for me...), PID
> w172vg029mkl852 Business Matters - Former US Vice President Al Gore on
> Climate Change (w172vg029mkl852)

More samples would allow the regexp to reject invalid ones, but perhaps

^(?:[bp]0[a-z0-9]{6}|w[a-z0-9]{7,14})$

`{7,14}' means from seven to fourteen of the preceding thing, inclusive.

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy

___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


Fw: Re: New radio PIDs, more than 8 characters

2017-08-13 Thread M Clark
Thanks for coding help but, the problem is worse (for me...),
PID w172vg029mkl852
Business Matters - Former US Vice President Al Gore on Climate Change 
(w172vg029mkl852)

So not even 8 characters.  Other World Service Series (?) have similar PIDs 
now, e.g. Weekend, World Update.

And I thought I was the only one who listened to WS!

M.


> Sent: Sunday, August 13, 2017 at 10:05 AM
> From: "Ralph Corderoy" 
> To: get_iplayer@lists.infradead.org
> Subject: Re: New radio PIDs
>
> Hi Vangelis,
> 
> > ...to begin with either "b0" or "p0".
> > 
> >  New radio PIDs like "w3csv1y9" or "w3csvnyc", beginning with "w3",
> ...
> > [bp]0[a-z0-9]{6}
> > with
> > [bpw][a-z0-9]{7}
> 
> Other approaches, getting gradually more specific.
> 
> ^[bpw][03][a-z0-9]{6}$
> But this allows b3.
> 
> ^(b0|p0|w3)[a-z0-9]{6}$
> This is precise, but it's common to factor out alternations since
> each is tried in turn, so...
> 
> ^([bp]0|w3)[a-z0-9]{6}$
> This is as precise.
> 
> The remaining problem is the `()' "capture" what matches for retrieval
> by the program afterwards as $1, $2, ...  By introducing another set of
> `()' we'd have affected the position of any that come afterwards in the
> same regexp.  (None in this case.)  It's also inefficient to capture
> when it's unnecessary.  `()' can be marked as non-capturing with `?:'.
> 
> ^(?:[bp]0|w3)[a-z0-9]{6}$
> 
> These regexps aren't specific to Perl, BTW, but are useful with egrep,
> awk, Python, etc.
> 
> -- 
> Cheers, Ralph.
> https://plus.google.com/+RalphCorderoy
> 
> ___
> get_iplayer mailing list
> get_iplayer@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/get_iplayer
> 

___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer