[Mlt-devel] vdpau performance

2012-02-04 Thread Maksym Veremeyenko

Hi,

i am trying to use vdpau for mpeg2 decoding. i did a test patch that 
enable vdpau init for mpeg2.


it works, but it still high cpu load.

enabling vdpau gives only 15% of cpu load decrease (81% with vdpau 96% 
without vdpau)


am i doing something wrong?

ps
i am testing 1920x1080 4:2:0 mpeg-2 files...

--

Maksym Veremeyenko
From a5b2d1e0f3ddbb4167ac4ecb9e4a69edf8bab083 Mon Sep 17 00:00:00 2001
From: Maksym Veremeyenko ve...@m1stereo.tv
Date: Sat, 4 Feb 2012 17:54:40 +0200
Subject: [PATCH] mpeg2 against vdpau probe

---
 src/modules/avformat/producer_avformat.c |   21 +++--
 src/modules/avformat/vdpau.c |   28 
 2 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/src/modules/avformat/producer_avformat.c b/src/modules/avformat/producer_avformat.c
index c94080d..b349b4c 100644
--- a/src/modules/avformat/producer_avformat.c
+++ b/src/modules/avformat/producer_avformat.c
@@ -1597,6 +1597,7 @@ static int producer_get_image( mlt_frame frame, uint8_t **buffer, mlt_image_form
 // Decode the image
 if ( must_decode || int_position = req_position )
 {
+#if 0
 #ifdef VDPAU
 	if ( self-vdpau )
 	{
@@ -1607,6 +1608,7 @@ static int producer_get_image( mlt_frame frame, uint8_t **buffer, mlt_image_form
 		self-vdpau-is_decoded = 0;
 	}
 #endif
+#endif
 	codec_context-reordered_opaque = pkt.pts;
 	if ( int_position = req_position )
 		codec_context-skip_loop_filter = AVDISCARD_NONE;
@@ -1835,23 +1837,30 @@ static int video_codec_init( producer_avformat self, int index, mlt_properties p
 		AVCodecContext *codec_context = stream-codec;
 
 		// Find the codec
-		AVCodec *codec = avcodec_find_decoder( codec_context-codec_id );
+		AVCodec *codec = NULL;
 #ifdef VDPAU
-		if ( codec_context-codec_id == CODEC_ID_H264 )
+		char* vdpau_codec = NULL;
+		switch ( codec_context-codec_id )
+		{
+			case CODEC_ID_H264:		vdpau_codec = h264_vdpau; break;
+			case CODEC_ID_MPEG2VIDEO:	vdpau_codec = mpegvideo_vdpau; break;
+		};
+
+		if ( vdpau_codec )
 		{
-			if ( ( codec = avcodec_find_decoder_by_name( h264_vdpau ) ) )
+			if ( ( codec = avcodec_find_decoder_by_name( vdpau_codec ) ) )
 			{
 if ( vdpau_init( self ) )
 {
 	self-video_codec = codec_context;
-	if ( !vdpau_decoder_init( self ) )
+	if ( !vdpau_decoder_init( self, codec_context-codec_id ) )
 		vdpau_fini( self );
 }
 			}
-			if ( !self-vdpau )
-codec = avcodec_find_decoder( codec_context-codec_id );
 		}
+		if ( !self-vdpau )
 #endif
+		codec = avcodec_find_decoder( codec_context-codec_id );
 
 		// Initialise multi-threading
 		int thread_count = mlt_properties_get_int( properties, threads );
diff --git a/src/modules/avformat/vdpau.c b/src/modules/avformat/vdpau.c
index b5710b3..c7f7b13 100644
--- a/src/modules/avformat/vdpau.c
+++ b/src/modules/avformat/vdpau.c
@@ -126,7 +126,7 @@ static int vdpau_init( producer_avformat self )
 
 static enum PixelFormat vdpau_get_format( struct AVCodecContext *s, const enum PixelFormat *fmt )
 {
-	return PIX_FMT_VDPAU_H264;
+	return *fmt;
 }
 
 static int vdpau_get_buffer( AVCodecContext *codec_context, AVFrame *frame )
@@ -152,14 +152,14 @@ static int vdpau_get_buffer( AVCodecContext *codec_context, AVFrame *frame )
 			frame-reordered_opaque = codec_context-reordered_opaque;
 			if ( frame-reference )
 			{
-frame-age = self-vdpau-ip_age[0];
+//frame-age = self-vdpau-ip_age[0];
 self-vdpau-ip_age[0] = self-vdpau-ip_age[1] + 1;
 self-vdpau-ip_age[1] = 1;
 self-vdpau-b_age++;
 			}
 			else
 			{
-frame-age = self-vdpau-b_age;
+//frame-age = self-vdpau-b_age;
 self-vdpau-ip_age[0] ++;
 self-vdpau-ip_age[1] ++;
 self-vdpau-b_age = 1;
@@ -219,21 +219,33 @@ static void vdpau_draw_horiz( AVCodecContext *codec_context, const AVFrame *fram
 	}
 }
 
-static int vdpau_decoder_init( producer_avformat self )
+static int vdpau_decoder_init( producer_avformat self, int codec_id )
 {
 	mlt_log_debug( MLT_PRODUCER_SERVICE(self-parent), vdpau_decoder_init\n );
 	int success = 1;
-	
+	VdpDecoderProfile profile;
+
 	self-video_codec-opaque = self;
 	self-video_codec-get_format = vdpau_get_format;
 	self-video_codec-get_buffer = vdpau_get_buffer;
 	self-video_codec-release_buffer = vdpau_release_buffer;
 	self-video_codec-draw_horiz_band = vdpau_draw_horiz;
 	self-video_codec-slice_flags = SLICE_FLAG_CODED_ORDER | SLICE_FLAG_ALLOW_FIELD;
-	self-video_codec-pix_fmt = PIX_FMT_VDPAU_H264;
-	
-	VdpDecoderProfile profile = VDP_DECODER_PROFILE_H264_HIGH;
+
 	uint32_t max_references = self-video_codec-refs;
+
+	switch(codec_id)
+	{
+		case CODEC_ID_H264:
+			self-video_codec-pix_fmt = PIX_FMT_VDPAU_H264;
+			profile = VDP_DECODER_PROFILE_H264_HIGH;
+			break;
+		case CODEC_ID_MPEG2VIDEO:
+			self-video_codec-pix_fmt = PIX_FMT_VDPAU_MPEG2;
+			profile = VDP_DECODER_PROFILE_MPEG2_MAIN;
+			break;
+	};
+
 	pthread_mutex_lock( mlt_sdl_mutex 

Re: [Mlt-devel] vdpau performance

2012-02-04 Thread Dan Dennedy
2012/2/4 Maksym Veremeyenko ve...@m1stereo.tv:
 Hi,

 i am trying to use vdpau for mpeg2 decoding. i did a test patch that enable
 vdpau init for mpeg2.

 it works, but it still high cpu load.

 enabling vdpau gives only 15% of cpu load decrease (81% with vdpau 96%
 without vdpau)

 am i doing something wrong?

Unlike other media players that only need to display via X11, MLT does
not use the P in VDPAU (presentation). Rather, it must fetch the
decoded images from video memory back into system memory, and that is
a bottleneck. That means it really only offers a benefit for H.264 on
older or low CPU powered devices. Someone is working on a set of GLSL
filters, and then it may have some more benefit by offloading enough
processing to justify the overhead of memory transfers.

-- 
+-DRD-+

--
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
___
Mlt-devel mailing list
Mlt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mlt-devel


Re: [Mlt-devel] vdpau performance

2012-02-04 Thread Dan Dennedy
On Sat, Feb 4, 2012 at 10:03 AM, Dan Dennedy d...@dennedy.org wrote:
 2012/2/4 Maksym Veremeyenko ve...@m1stereo.tv:
 Hi,

 i am trying to use vdpau for mpeg2 decoding. i did a test patch that enable
 vdpau init for mpeg2.

 it works, but it still high cpu load.

 enabling vdpau gives only 15% of cpu load decrease (81% with vdpau 96%
 without vdpau)

 am i doing something wrong?

 Unlike other media players that only need to display via X11, MLT does
 not use the P in VDPAU (presentation). Rather, it must fetch the
 decoded images from video memory back into system memory, and that is
 a bottleneck. That means it really only offers a benefit for H.264 on
 older or low CPU powered devices. Someone is working on a set of GLSL
 filters, and then it may have some more benefit by offloading enough
 processing to justify the overhead of memory transfers.

I added a FAQ topic for this:
http://www.mltframework.org/twiki/bin/view/MLT/Questions#Why_does_VDPAU_not_seem_to_offer

-- 
+-DRD-+

--
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
___
Mlt-devel mailing list
Mlt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mlt-devel


Re: [Mlt-devel] [PATCH] use sse2 instruction for line compositing

2012-02-04 Thread Dan Dennedy
2012/2/3 Maksym Veremeyenko ve...@m1stereo.tv:
 02.02.12 18:57, Maksym Veremeyenko написав(ла):

 Hi,

 attached patch perform line compositing for SSE2+ARCH_X86_64 build. It
 works for a case where luma is not defined...


 updated patch attached

If I am not mistaken, this change reduces precision to 8 pixels. The
existing transition is already limited to a 2 pixel precision, which I
am not happy about. I do not want to further reduce the precision,
give different results depending on CPU, and effectively introduce a
regression, as far as the user is concerned. Maybe we should limit it
to only
apply when width is a multiple of 8. Then, it would still be used for
fullscreen composite on most profiles' resolution.

-- 
+-DRD-+

--
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
___
Mlt-devel mailing list
Mlt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mlt-devel