Re: [vdr] xmltv2vdr speedup and modification

2007-02-14 Thread Sebastien Lucas

On 2/13/07, [EMAIL PROTECTED]
[EMAIL PROTECTED] wrote:



 But it didn't help at all with my benchmark.
 ...
 For information that change has no impact on my bench.

Interesting, what version of Perl are you running if those
changes don't do anything?



vdr26:~/xmltv# time ./xmltv2vdrv5.pl -s -c channels.conf -x tvguide.xml

real3m4.397s
user2m50.475s
sys 0m6.052s

vdr26:~/xmltv# time ./xmltv2vdrv6.pl -s -c channels.conf -x tvguide.xml

real3m4.309s
user2m48.951s
sys 0m7.240s

xmltv2vdrv5 = the version I posted
xmltv2vdrv6 = the version I posted + the o switch on all regex + title
and subtitle now use regex and no more split.

about perl (from debian sarge) :
vdr26:~/xmltv# perl --version

This is perl, v5.8.4 built for i386-linux-thread-multi



Futher improvement is that now it is unnecessary to read whole
XML-file into memory, as the file is linearly scanned through. So no
need to waste 5MB of memory if you are short of it.

--
# Read all the XMLTV stuff into memory - quicker parsing
open(XMLTV, $xmltvfile) || die cannot open xmltv file;
@xmllines=XMLTV;
close(XMLTV);

sub ProcessEpg
# Find XML events

foreach $xmlline (@xmllines)

--

=
open(XMLTV, $xmltvfile) || die cannot open xmltv file;

sub ProcessEpg

while($xmlline = XMLTV)



Good idea, I have not thought about it (my wonderful Celeron 233 has
384Mo of Ram).

Thanks for your help.

Sébastien

___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


[vdr] xmltv2vdr speedup and modification

2007-02-13 Thread Morfsta

Hi,

I'm the original author of xml2vdr (for my sins!), thanks for working on it
and improving the performance.

If everyone is happy with this new code release and it really improves the
speed I'll wrap it up and make a formal release and get Klaus to add it to
the VDR FTP site.

As a lot of people seem to use xml2vdr, perhaps it would be good to
resurrect it and keep it formally updated?

Regards,

Morfsta



On 2/12/07, Sebastien Lucas [EMAIL PROTECTED] wrote:


Hi,

I recently worked on xmltv2vdr.pl (version 1.0.6) and checked why it
was so slow on my mighty Celeron 233. So I modified it a little to
avoid reading all the xmltv file for each channel defined in the
channels.conf. The result is good : I can process my 5Mo xmltv file in
less than 10 minutes whereas it took at least 1 hour with vanilla
1.0.6 release.

I also added support for sub-title in the xmltv file (I think someone
already posted about that in the list).

I only use it for one week so it can still be buggy. I'll be happy to
take care of any bug found.

Hope this helps those with old hardware like me.

Sebastien

___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr



___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


Re: [vdr] xmltv2vdr speedup and modification

2007-02-13 Thread Sebastien Lucas

On 2/12/07, [EMAIL PROTECTED]
[EMAIL PROTECTED] wrote:

 I recently worked on xmltv2vdr.pl (version 1.0.6) and checked
 why it was so slow on my mighty Celeron 233. So I modified it
 a little to avoid reading all the xmltv file for each channel
 defined in the channels.conf. The result is good : I can process
 my 5Mo xmltv file in less than 10 minutes whereas it took at least
 1 hour with vanilla 1.0.6 release.

Something more what you can do (just by looking source you provided)..

--

Caching of xmltime2vdr, like

return $timecache{$xmltime}-{$skew} if defined 
$timecache{$xmltime}-{$skew}
$secs = Date::Manip::UnixDate($xmltime, %s) + $skew*60;
$timecache{$xmltime}-{$skew} = $secs
return secs;

But it depends on how much this function is called.. But hash lookup is probably
faster than running UnixDate from library. So it is a memory tradeoff.


I still haven't tested it but I doubt it'll help . I'll check later.


I see that there is still some basic Perl based optimizations for this.

For example there is browsing through @xmllines array, and every iteration
you recompile *ALL* regexp's. That is as many times as @xmllines has lines.
And if one recompile takes 1ms - you waste time @xmllines * 1ms just for
compiling and not doing anything usefull.

Perl switch o is recompile once flag, use that everywhere where it is
possible. Variable is not a problem unless variable changes in every iteration.


[]

I didn't know that (I'm not really a perl guru ... far from it). I'll
update my version. But it didn't help at all with my benchmark.



--

As there is many times $xmlline is matched with regexps etc. You should 
experiment
with study $xmlline; after chomp $xmlline. Study makes internal search tables
for string matches. So see which way the code is faster, with study or without
study. Use Unix shell's time-command for this. For extra boost with study you
probably would need to take away subroutine xmltvtranslate as for it $xmlline
is copied to subroutine's parameter space, and what is matched. And study would
not affect it. So instead of calling $xmlline=xmltvtranslate($xmlline); 
cutpaste
subroutines code here, and use $xmlline instead of $line.

foreach $xmlline (@xmllines)
{
chomp $xmlline;
study $xmlline;
$xmlline=~s/und uuml;/ü/go;
$xmlline...

This isn't pretty but could probably help a bit. You save time for @xmllines 
times calling
subroutine, and study would help you a lot as you use the same string all the 
time.



I'll check that later.



For constant string you could use ' ' instead of  .  causes string to be
evaluated for variables

if ( $chanCur eq  ) -- if ( $chanCur eq '' )

But this would be very minor effect..



I'll surely be too lazy to test that. sorry.



Split is heavy operation because of creating arrays, but you can limit it.

( $null, $xmlst, $null, $xmlet, @null ) = split(/\/, $xmlline);

= ( $null, $xmlst, $null, $xmlet, $null ) = split(/\/, $xmlline, 5);

or even using regexp for this. I don't know input line for this, but if it is
foo,something,something,...

($xmlst,$xmlet) = $xmlline =~ m:\(.*?)\,\(.*?)\:o;

or probably combine 2 regexp to a single

($xmlst,$xmlet,$channel) = $xmlline =~ 
m:\(.*?)\,\(.*?)\.*?channel=\(.*?)\:o;

--

Again something very weird:

if ( ($xmlline =~ /\title/ ) )
{
#print $xmlline . \n;
( $null, $tmp ) = split(/\/, $xmlline);
( $vdrtitle, @null ) = split(/\/, $tmp);

# Send VDR Title

SVDRPsend(T $vdrtitle);
}

Why not?

SVDRPsend(T $1) if $xmlline =~ m:\title\(.*?)\/title\:o;

Same for XML subtitle
SVDRPsend(T $1) if $xmlline =~ m:\sub-title\(.*?)\/sub-title\:o;


Yes I'll also prefer shorter code. I'll check further if something
like title lang=en is also allowed to adapt the regex. For
information that change has no impact on my bench.


Generally
   if ( ($xmlline =~ /\desc/ )  ( $desccount == $dc ))
{
( $null, $tmp ) = split(/\/, $xmlline);
( $vdrdesc, @null ) = split(/\/, $tmp);

this is not a clever way to parse XML data in Perl. Just us regexp's which
match strings with Boyer-Moore algorithm (same as Unix grep) and compile once.



Agree. I'll try to modify it.



Some logical errors

if ( ($xmlline =~ /\programme/ )  ( $xmlline !~ /clumpidx=\1\/2\/ )  ( 
$chanevent == 0 ) )

= if ( ( $chanevent == 0 )  ($xmlline =~ /\programme/ )  ( $xmlline !~ 
/clumpidx=\1\/2\/ ) )

so program execution can skip if $chanevent != 0 much faster.
So Regexp would not be ran. This is normal short circuit operation.



In fact the check $chanevent == 0 is only usefull if the xml is not
well formed so it doesn't change anything.


Then
elsif ( $chanCur ne $chan )
{
SVDRPsend(c);
SVDRPsend(.);
SVDRPreceive(250);

I think programmer wanted outout of . -command, and see if it's 

Re: [vdr] xmltv2vdr speedup and modification

2007-02-13 Thread Sebastien Lucas

On 2/13/07, Morfsta [EMAIL PROTECTED] wrote:

Hi,

I'm the original author of xml2vdr (for my sins!), thanks for working on it
and improving the performance.


I know and thanks for it.


 If everyone is happy with this new code release and it really improves the
speed I'll wrap it up and make a formal release and get Klaus to add it to
the VDR FTP site.


I'd be happy it goes that way.


As a lot of people seem to use xml2vdr, perhaps it would be good to
resurrect it and keep it formally updated?


Yes there also was some interesting post in this mailing list about xmltv2vdr.

Thanks for your post.

Sebastien

___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


RE: [vdr] xmltv2vdr speedup and modification

2007-02-13 Thread jori.hamalainen
 

 But it didn't help at all with my benchmark.
 ...
 For information that change has no impact on my bench.

Interesting, what version of Perl are you running if those
changes don't do anything?

---

Futher improvement is that now it is unnecessary to read whole
XML-file into memory, as the file is linearly scanned through. So no
need to waste 5MB of memory if you are short of it.

--
# Read all the XMLTV stuff into memory - quicker parsing
open(XMLTV, $xmltvfile) || die cannot open xmltv file;
@xmllines=XMLTV;
close(XMLTV);

sub ProcessEpg
# Find XML events

foreach $xmlline (@xmllines)

--

= 
open(XMLTV, $xmltvfile) || die cannot open xmltv file;

sub ProcessEpg

while($xmlline = XMLTV) 

___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


[vdr] xmltv2vdr speedup and modification

2007-02-12 Thread Sebastien Lucas

Hi,

I recently worked on xmltv2vdr.pl (version 1.0.6) and checked why it
was so slow on my mighty Celeron 233. So I modified it a little to
avoid reading all the xmltv file for each channel defined in the
channels.conf. The result is good : I can process my 5Mo xmltv file in
less than 10 minutes whereas it took at least 1 hour with vanilla
1.0.6 release.

I also added support for sub-title in the xmltv file (I think someone
already posted about that in the list).

I only use it for one week so it can still be buggy. I'll be happy to
take care of any bug found.

Hope this helps those with old hardware like me.

Sebastien


xmltv2vdrv5.pl
Description: Binary data
___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


RE: [vdr] xmltv2vdr speedup and modification

2007-02-12 Thread jori.hamalainen
 I recently worked on xmltv2vdr.pl (version 1.0.6) and checked 
 why it was so slow on my mighty Celeron 233. So I modified it
 a little to avoid reading all the xmltv file for each channel
 defined in the channels.conf. The result is good : I can process 
 my 5Mo xmltv file in less than 10 minutes whereas it took at least
 1 hour with vanilla 1.0.6 release.

Something more what you can do (just by looking source you provided)..

--

Caching of xmltime2vdr, like

return $timecache{$xmltime}-{$skew} if defined 
$timecache{$xmltime}-{$skew}
$secs = Date::Manip::UnixDate($xmltime, %s) + $skew*60;
$timecache{$xmltime}-{$skew} = $secs
return secs;

But it depends on how much this function is called.. But hash lookup is probably
faster than running UnixDate from library. So it is a memory tradeoff.

--

I see that there is still some basic Perl based optimizations for this.

For example there is browsing through @xmllines array, and every iteration
you recompile *ALL* regexp's. That is as many times as @xmllines has lines.
And if one recompile takes 1ms - you waste time @xmllines * 1ms just for
compiling and not doing anything usefull.

Perl switch o is recompile once flag, use that everywhere where it is 
possible. Variable is not a problem unless variable changes in every iteration.

# New XML Program - doesn't handle split programs yet
if ( ($xmlline =~ /\programme/o )  ( $xmlline !~ /clumpidx=\1\/2\/o )  ( 
$chanevent == 0 ) )
{
  ( $null, $xmlst, $null, $xmlet, @null ) = split(/\/, $xmlline);
  ( $chan ) = ( $xmlline =~ m/channel\=\(.*?)\/o );
...

And all the lines in subroutine xmltvtranslate should be with o -flag.
$line=~s/ und uuml;/ü/go;
$line=~s/ und auml;/ä/go; 
$line=~s/ und ouml;/ö/go;

and you are running twice same for UAO UML's, with and without spaces. You
don't need to run it with spaces if you are running without spaces.
$line=~s/ und auml;/ä/go;  - this is unnecessary because later will match. 
 
$line=~s/und auml;/ä/go;

--

As there is many times $xmlline is matched with regexps etc. You should 
experiment
with study $xmlline; after chomp $xmlline. Study makes internal search tables
for string matches. So see which way the code is faster, with study or without 
study. Use Unix shell's time-command for this. For extra boost with study you
probably would need to take away subroutine xmltvtranslate as for it $xmlline
is copied to subroutine's parameter space, and what is matched. And study would
not affect it. So instead of calling $xmlline=xmltvtranslate($xmlline); 
cutpaste
subroutines code here, and use $xmlline instead of $line.

foreach $xmlline (@xmllines)
{
chomp $xmlline;
study $xmlline;
$xmlline=~s/und uuml;/ü/go;
$xmlline...

This isn't pretty but could probably help a bit. You save time for @xmllines 
times calling
subroutine, and study would help you a lot as you use the same string all the 
time.

--

For constant string you could use ' ' instead of  .  causes string to be
evaluated for variables

if ( $chanCur eq  ) -- if ( $chanCur eq '' )

But this would be very minor effect..

--

Split is heavy operation because of creating arrays, but you can limit it.

( $null, $xmlst, $null, $xmlet, @null ) = split(/\/, $xmlline);

= ( $null, $xmlst, $null, $xmlet, $null ) = split(/\/, $xmlline, 5);

or even using regexp for this. I don't know input line for this, but if it is
foo,something,something,...

($xmlst,$xmlet) = $xmlline =~ m:\(.*?)\,\(.*?)\:o;

or probably combine 2 regexp to a single

($xmlst,$xmlet,$channel) = $xmlline =~ 
m:\(.*?)\,\(.*?)\.*?channel=\(.*?)\:o;

--

Again something very weird:

if ( ($xmlline =~ /\title/ ) )
{
#print $xmlline . \n;
( $null, $tmp ) = split(/\/, $xmlline);
( $vdrtitle, @null ) = split(/\/, $tmp);

# Send VDR Title

SVDRPsend(T $vdrtitle);
}

Why not?

SVDRPsend(T $1) if $xmlline =~ m:\title\(.*?)\/title\:o;

Same for XML subtitle
SVDRPsend(T $1) if $xmlline =~ m:\sub-title\(.*?)\/sub-title\:o;

Generally
   if ( ($xmlline =~ /\desc/ )  ( $desccount == $dc ))
{
( $null, $tmp ) = split(/\/, $xmlline);
( $vdrdesc, @null ) = split(/\/, $tmp);

this is not a clever way to parse XML data in Perl. Just us regexp's which
match strings with Boyer-Moore algorithm (same as Unix grep) and compile once.

--

Some logical errors

if ( ($xmlline =~ /\programme/ )  ( $xmlline !~ /clumpidx=\1\/2\/ )  ( 
$chanevent == 0 ) )

= if ( ( $chanevent == 0 )  ($xmlline =~ /\programme/ )  ( 
$xmlline !~ /clumpidx=\1\/2\/ ) )

so program execution can skip if $chanevent != 0 much faster. 
So Regexp would not be ran. This is normal short circuit operation.


Then
elsif ( $chanCur ne $chan )
{
SVDRPsend(c);
SVDRPsend(.);
SVDRPreceive(250);

I think programmer