Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver
On 1/12/07, Nick Piggin <[EMAIL PROTECTED]> wrote: Jaya Kumar wrote: > - write so get page_mkwrite where we add this page to a list > - also schedules a workqueue task to be run after a delay > - app continues writing to that page with no additional cost > - the workqueue task comes in and unmaps the pages on the list, then > completes the work associated with updating the framebuffer Have you thought about implementing a traditional write-back cache using the dirty bits, rather than unmapping the page? Ah, sorry, I erred in my description. I'm not unmapping pages, I'm calling page_mkclean which uses the dirty bits. Thanks, jaya - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver
On 1/12/07, Nick Piggin [EMAIL PROTECTED] wrote: Jaya Kumar wrote: - write so get page_mkwrite where we add this page to a list - also schedules a workqueue task to be run after a delay - app continues writing to that page with no additional cost - the workqueue task comes in and unmaps the pages on the list, then completes the work associated with updating the framebuffer Have you thought about implementing a traditional write-back cache using the dirty bits, rather than unmapping the page? Ah, sorry, I erred in my description. I'm not unmapping pages, I'm calling page_mkclean which uses the dirty bits. Thanks, jaya - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver
On Fri, 12 Jan 2007 08:15:45 +0100 Peter Zijlstra <[EMAIL PROTECTED]> wrote: > How about implementing the sync_page() aop? That got deleted in Jens's tree - the unplugging rework. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver
On Thu, 2007-01-11 at 19:22 -0500, Jaya Kumar wrote: > Agreed. Though I may be misunderstanding what you mean by first-touch. > Currently, I do a schedule_delayed_work and leave 1s between when the > page_mkwrite callback indicating the first touch is received and when > the deferred IO is processed to actually deliver the data to the > display. I picked 1s because it rounds up the display latency. I > imagine increasing the delay further may make it miss some desirable > display activity. For example, a slider indicating progress of music > may be slower than optimal. Perhaps I should make the delay a module > parameter and leave the choice to the user? How about implementing the sync_page() aop? Then you could force the flush using msync(MS_SYNC). Hmm... that might require more surgery but the idea would work I think. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver
Jaya Kumar wrote: On 1/11/07, Andrew Morton <[EMAIL PROTECTED]> wrote: That's all very interesting. Please don't dump a bunch of new implementation concepts like this on us with no description of what it does, why it does it and why it does it in this particular manner. Hi Andrew, Actually, I didn't dump without description. :-) I had posted an RFC and an explanation of the design to the lists. Here's an archive link to that post. http://marc.theaimsgroup.com/?l=linux-kernel=116583546411423=2 I wasn't sure whether to include that description with the patch email because it was long. From that email: --- This is there in order to hide the latency associated with updating the display (500ms to 800ms). The method used is to fake a framebuffer in memory. Then use pagefaults followed by delayed unmaping and only then do the actual framebuffer update. To explain this better, the usage scenario is like this: - userspace app like Xfbdev mmaps framebuffer - driver handles and sets up nopage and page_mkwrite handlers - app tries to write to mmaped vaddress - get pagefault and reaches driver's nopage handler - driver's nopage handler finds and returns physical page ( no actual framebuffer ) - write so get page_mkwrite where we add this page to a list - also schedules a workqueue task to be run after a delay - app continues writing to that page with no additional cost - the workqueue task comes in and unmaps the pages on the list, then completes the work associated with updating the framebuffer Have you thought about implementing a traditional write-back cache using the dirty bits, rather than unmapping the page? -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver
On Thu, 11 Jan 2007 19:22:45 -0500 "Jaya Kumar" <[EMAIL PROTECTED]> wrote: > On 1/11/07, Andrew Morton <[EMAIL PROTECTED]> wrote: > > That's all very interesting. > > > > Please don't dump a bunch of new implementation concepts like this on us > > with no description of what it does, why it does it and why it does it in > > this particular manner. > > Hi Andrew, > > Actually, I didn't dump without description. :-) I had posted an RFC > and an explanation of the design to the lists. Here's an archive link > to that post. > http://marc.theaimsgroup.com/?l=linux-kernel=116583546411423=2 > I wasn't sure whether to include that description with the patch email > because it was long. Yes, please always include the full description as an integral part of the patch. In fact, it's very often best to communicate this information permanently, via code comments. > >From that email: > --- > This is there in order to hide the latency > associated with updating the display (500ms to 800ms). The method used > is to fake a framebuffer in memory. Then use pagefaults followed by delayed > unmaping and only then do the actual framebuffer update. To explain this > better, the usage scenario is like this: > > - userspace app like Xfbdev mmaps framebuffer > - driver handles and sets up nopage and page_mkwrite handlers > - app tries to write to mmaped vaddress > - get pagefault and reaches driver's nopage handler > - driver's nopage handler finds and returns physical page ( no > actual framebuffer ) > - write so get page_mkwrite where we add this page to a list > - also schedules a workqueue task to be run after a delay > - app continues writing to that page with no additional cost > - the workqueue task comes in and unmaps the pages on the list, then > completes the work associated with updating the framebuffer > - app tries to write to the address (that has now been unmapped) > - get pagefault and the above sequence occurs again > > The desire is roughly to allow bursty framebuffer writes to occur. > Then after some time when hopefully things have gone quiet, we go and > really update the framebuffer. For this type of nonvolatile high latency > display, the desired image is the final image rather than intermediate > stages which is why it's okay to not update for each write that is > occuring. OK, makes sense. The whole idea is neat. > > > > > What is the "theory of operation" here? > > > > Presumably this is a performance optimisation to permit batching of the > > copying from user memory into the frambuffer card? If so, how much > > performance does it gain? > > Yes, you are right. Updating the E-Ink display currently requires > about 500ms - 800ms. It is a non-volatile display and as such it is > typically used in a manner where only the final image is important. As > a result, being able to avoid the bursts of IO associated with screen > activity and only write the final result is attractive. > > I have not done any performance benchmarks. I'm not sure exactly what > to compare. I imagine in one case would be using write() to deliver > the image updates and the other case would be mmap(), memcpy(). The > latter would win because it's hiding all the intermediate "writes". > > > > > I expect the benefit will be large, and could be increased if you were to > > add a small delay between first-touch and writeback to the display. Let's > > talk about that a bit. > > Agreed. Though I may be misunderstanding what you mean by first-touch. First modification - when the page goes from clean to dirty (and page_mkwrite gets called) > Currently, I do a schedule_delayed_work and leave 1s between when the > page_mkwrite callback indicating the first touch is received and when > the deferred IO is processed to actually deliver the data to the > display. oh, doh - I missed the fact that you're already adding a delay. > I picked 1s because it rounds up the display latency. I > imagine increasing the delay further may make it miss some desirable > display activity. For example, a slider indicating progress of music > may be slower than optimal. Perhaps I should make the delay a module > parameter and leave the choice to the user? Don't know - your call. It would be interesting to know if this trick is applicable to any other framebuffer drivers. > > > > Is the optimisation applicable to other drivers? If so, should it be > > generalised into library code somewhere? > > I think the deferred IO code would be useful to devices that have slow > updates and where only the final result is important. So far, this is > the only device I've encountered that has this characteristic. OK. > > > > I guess the export of page_mkclean() makes sense for this application. > > > > The use of lock_page_nosync() is wrong. It can still sleep, and here it's > > inside spinlock. And we don't want to export __lock_page_nosync() to > > modules. I suggest you convert the list locking here to a mutex and use > > lock_page(). > > > >
Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver
On 1/11/07, Andrew Morton <[EMAIL PROTECTED]> wrote: That's all very interesting. Please don't dump a bunch of new implementation concepts like this on us with no description of what it does, why it does it and why it does it in this particular manner. Hi Andrew, Actually, I didn't dump without description. :-) I had posted an RFC and an explanation of the design to the lists. Here's an archive link to that post. http://marc.theaimsgroup.com/?l=linux-kernel=116583546411423=2 I wasn't sure whether to include that description with the patch email because it was long. From that email: --- This is there in order to hide the latency associated with updating the display (500ms to 800ms). The method used is to fake a framebuffer in memory. Then use pagefaults followed by delayed unmaping and only then do the actual framebuffer update. To explain this better, the usage scenario is like this: - userspace app like Xfbdev mmaps framebuffer - driver handles and sets up nopage and page_mkwrite handlers - app tries to write to mmaped vaddress - get pagefault and reaches driver's nopage handler - driver's nopage handler finds and returns physical page ( no actual framebuffer ) - write so get page_mkwrite where we add this page to a list - also schedules a workqueue task to be run after a delay - app continues writing to that page with no additional cost - the workqueue task comes in and unmaps the pages on the list, then completes the work associated with updating the framebuffer - app tries to write to the address (that has now been unmapped) - get pagefault and the above sequence occurs again The desire is roughly to allow bursty framebuffer writes to occur. Then after some time when hopefully things have gone quiet, we go and really update the framebuffer. For this type of nonvolatile high latency display, the desired image is the final image rather than intermediate stages which is why it's okay to not update for each write that is occuring. --- What is the "theory of operation" here? Presumably this is a performance optimisation to permit batching of the copying from user memory into the frambuffer card? If so, how much performance does it gain? Yes, you are right. Updating the E-Ink display currently requires about 500ms - 800ms. It is a non-volatile display and as such it is typically used in a manner where only the final image is important. As a result, being able to avoid the bursts of IO associated with screen activity and only write the final result is attractive. I have not done any performance benchmarks. I'm not sure exactly what to compare. I imagine in one case would be using write() to deliver the image updates and the other case would be mmap(), memcpy(). The latter would win because it's hiding all the intermediate "writes". I expect the benefit will be large, and could be increased if you were to add a small delay between first-touch and writeback to the display. Let's talk about that a bit. Agreed. Though I may be misunderstanding what you mean by first-touch. Currently, I do a schedule_delayed_work and leave 1s between when the page_mkwrite callback indicating the first touch is received and when the deferred IO is processed to actually deliver the data to the display. I picked 1s because it rounds up the display latency. I imagine increasing the delay further may make it miss some desirable display activity. For example, a slider indicating progress of music may be slower than optimal. Perhaps I should make the delay a module parameter and leave the choice to the user? Is the optimisation applicable to other drivers? If so, should it be generalised into library code somewhere? I think the deferred IO code would be useful to devices that have slow updates and where only the final result is important. So far, this is the only device I've encountered that has this characteristic. I guess the export of page_mkclean() makes sense for this application. The use of lock_page_nosync() is wrong. It can still sleep, and here it's inside spinlock. And we don't want to export __lock_page_nosync() to modules. I suggest you convert the list locking here to a mutex and use lock_page(). Oops, sorry about that. I will correct it. Thanks, jayakumar - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver
On Thu, 11 Jan 2007 15:24:27 +0100 Jaya Kumar <[EMAIL PROTECTED]> wrote: > +/* this is to find and return the vmalloc-ed fb pages */ > +static struct page* hecubafb_vm_nopage(struct vm_area_struct *vma, > + unsigned long vaddr, int *type) > +{ > + unsigned long offset; > + struct page *page; > + struct fb_info *info = vma->vm_private_data; > + > + offset = (vaddr - vma->vm_start) + (vma->vm_pgoff << PAGE_SHIFT); > + if (offset >= (DPY_W*DPY_H)/8) > + return NOPAGE_SIGBUS; > + > + page = vmalloc_to_page(info->screen_base + offset); > + if (!page) > + return NOPAGE_OOM; > + > + get_page(page); > + if (type) > + *type = VM_FAULT_MINOR; > + return page; > +} > + > +static void hecubafb_work(struct work_struct *work) > +{ > + struct hecubafb_par *par = container_of(work, struct hecubafb_par, > + deferred_work.work); > + struct list_head *node, *next; > + struct page_list *cur; > + > + /* here we unmap the pages, then do all deferred IO */ > + spin_lock(>lock); > + list_for_each_safe(node, next, >pagelist) { > + cur = list_entry(node, struct page_list, list); > + list_del(node); > + lock_page_nosync(cur->page); > + page_mkclean(cur->page); > + unlock_page(cur->page); > + kfree(cur); > + } > + spin_unlock(>lock); > + hecubafb_dpy_update(par); > +} > + > +static int hecubafb_page_mkwrite(struct vm_area_struct *vma, > + struct page *page) > +{ > + struct fb_info *info = vma->vm_private_data; > + struct hecubafb_par *par = info->par; > + struct page_list *new; > + > + /* this is a callback we get when userspace first tries to > + write to the page. we schedule a workqueue. that workqueue > + will eventually unmap the touched pages and execute the > + deferred framebuffer IO. then if userspace touches a page > + again, we repeat the same scheme */ > + > + new = kzalloc(sizeof(struct page_list), GFP_KERNEL); > + if (!new) > + return -ENOMEM; > + new->page = page; > + > + /* protect against the workqueue changing the page list */ > + spin_lock(>lock); > + list_add(>list, >pagelist); > + spin_unlock(>lock); > + > + /* come back in 1s to process the deferred IO */ > + schedule_delayed_work(>deferred_work, HZ); > + return 0; > +} That's all very interesting. Please don't dump a bunch of new implementation concepts like this on us with no description of what it does, why it does it and why it does it in this particular manner. What is the "theory of operation" here? Presumably this is a performance optimisation to permit batching of the copying from user memory into the frambuffer card? If so, how much performance does it gain? I expect the benefit will be large, and could be increased if you were to add a small delay between first-touch and writeback to the display. Let's talk about that a bit. Is the optimisation applicable to other drivers? If so, should it be generalised into library code somewhere? I guess the export of page_mkclean() makes sense for this application. The use of lock_page_nosync() is wrong. It can still sleep, and here it's inside spinlock. And we don't want to export __lock_page_nosync() to modules. I suggest you convert the list locking here to a mutex and use lock_page(). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver
This patch adds support for the Hecuba/E-Ink display with deferred IO. I welcome your feedback and advice. Signed-off-by: Jaya Kumar <[EMAIL PROTECTED]> --- drivers/video/Kconfig| 13 + drivers/video/Makefile |1 drivers/video/hecubafb.c | 568 +++ mm/filemap.c |1 mm/rmap.c|1 5 files changed, 584 insertions(+) --- diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig index 4e83f01..cf2dc50 100644 --- a/drivers/video/Kconfig +++ b/drivers/video/Kconfig @@ -568,6 +568,19 @@ config FB_IMAC help This is the frame buffer device driver for the Intel-based Macintosh +config FB_HECUBA + tristate "Hecuba board support" + depends on FB && X86 && MMU + select FB_CFB_FILLRECT + select FB_CFB_COPYAREA + select FB_CFB_IMAGEBLIT + help + This enables support for the Hecuba board. This driver was tested + with an E-Ink 800x600 display and x86 SBCs through a 16 bit GPIO + interface (8 bit data, 4 bit control). If you anticpate using + this driver, say Y or M; otherwise say N. You must specify the + GPIO IO address to be used for setting control and data. + config FB_HGA tristate "Hercules mono graphics support" depends on FB && X86 diff --git a/drivers/video/Makefile b/drivers/video/Makefile index 309a26d..b4d5655 100644 --- a/drivers/video/Makefile +++ b/drivers/video/Makefile @@ -67,6 +67,7 @@ obj-$(CONFIG_FB_SGIVW)+= sgivwfb.o obj-$(CONFIG_FB_ACORN)+= acornfb.o obj-$(CONFIG_FB_ATARI)+= atafb.o obj-$(CONFIG_FB_MAC) += macfb.o +obj-$(CONFIG_FB_HECUBA) += hecubafb.o obj-$(CONFIG_FB_HGA) += hgafb.o obj-$(CONFIG_FB_IGA) += igafb.o obj-$(CONFIG_FB_APOLLO) += dnfb.o diff --git a/drivers/video/hecubafb.c b/drivers/video/hecubafb.c new file mode 100644 index 000..4740b92 --- /dev/null +++ b/drivers/video/hecubafb.c @@ -0,0 +1,568 @@ +/* + * linux/drivers/video/hecubafb.c -- FB driver for Hecuba controller + * + * Copyright (C) 2006, Jaya Kumar + * This work was sponsored by CIS(M) Sdn Bhd + * + * This file is subject to the terms and conditions of the GNU General Public + * License. See the file COPYING in the main directory of this archive for + * more details. + * + * Layout is based on skeletonfb.c by James Simmons and Geert Uytterhoeven. + * This work was possible because of apollo display code from E-Ink's website + * http://support.eink.com/community + * All information used to write this code is from public material made + * available by E-Ink on its support site. Some commands such as 0xA4 + * were found by looping through cmd=0x00 thru 0xFF and supplying random + * values. There are other commands that the display is capable of, + * beyond the 5 used here but they are more complex. + * + * This driver is written to be used with the Hecuba display controller + * board, and tested with the EInk 800x600 display in 1 bit mode. + * The interface between Hecuba and the host is TTL based GPIO. The + * GPIO requirements are 8 writable data lines and 6 lines for control. + * Only 4 of the controls are actually used here but 6 for future use. + * The driver requires the IO addresses for data and control GPIO at + * load time. It is also possible to use this display with a standard + * PC parallel port. + * + * General notes: + * - User must set hecubafb_enable=1 to enable it + * - User must set dio_addr=0xIOADDR cio_addr=0xIOADDR c2io_addr=0xIOADDR + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* to support deferred IO */ +#include +#include + +/* Apollo controller specific defines */ +#define APOLLO_START_NEW_IMG 0xA0 +#define APOLLO_STOP_IMG_DATA 0xA1 +#define APOLLO_DISPLAY_IMG 0xA2 +#define APOLLO_ERASE_DISPLAY 0xA3 +#define APOLLO_INIT_DISPLAY0xA4 + +/* Hecuba interface specific defines */ +/* WUP is inverted, CD is inverted, DS is inverted */ +#define HCB_NWUP_BIT 0x01 +#define HCB_NDS_BIT0x02 +#define HCB_RW_BIT 0x04 +#define HCB_NCD_BIT0x08 +#define HCB_ACK_BIT0x80 + +/* Display specific information */ +#define DPY_W 600 +#define DPY_H 800 + +struct hecubafb_par { + struct delayed_work deferred_work; + unsigned long dio_addr; + unsigned long cio_addr; + unsigned long c2io_addr; + unsigned char ctl; + atomic_t ref_count; + atomic_t vma_count; + struct fb_info *info; + unsigned int irq; + spinlock_t lock; + struct list_head pagelist; +}; + +struct page_list { + struct list_head list; + struct page *page; +}; + +static struct fb_fix_screeninfo hecubafb_fix __initdata = { + .id = "hecubafb", + .type = FB_TYPE_PACKED_PIXELS, +
[PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver
This patch adds support for the Hecuba/E-Ink display with deferred IO. I welcome your feedback and advice. Signed-off-by: Jaya Kumar [EMAIL PROTECTED] --- drivers/video/Kconfig| 13 + drivers/video/Makefile |1 drivers/video/hecubafb.c | 568 +++ mm/filemap.c |1 mm/rmap.c|1 5 files changed, 584 insertions(+) --- diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig index 4e83f01..cf2dc50 100644 --- a/drivers/video/Kconfig +++ b/drivers/video/Kconfig @@ -568,6 +568,19 @@ config FB_IMAC help This is the frame buffer device driver for the Intel-based Macintosh +config FB_HECUBA + tristate Hecuba board support + depends on FB X86 MMU + select FB_CFB_FILLRECT + select FB_CFB_COPYAREA + select FB_CFB_IMAGEBLIT + help + This enables support for the Hecuba board. This driver was tested + with an E-Ink 800x600 display and x86 SBCs through a 16 bit GPIO + interface (8 bit data, 4 bit control). If you anticpate using + this driver, say Y or M; otherwise say N. You must specify the + GPIO IO address to be used for setting control and data. + config FB_HGA tristate Hercules mono graphics support depends on FB X86 diff --git a/drivers/video/Makefile b/drivers/video/Makefile index 309a26d..b4d5655 100644 --- a/drivers/video/Makefile +++ b/drivers/video/Makefile @@ -67,6 +67,7 @@ obj-$(CONFIG_FB_SGIVW)+= sgivwfb.o obj-$(CONFIG_FB_ACORN)+= acornfb.o obj-$(CONFIG_FB_ATARI)+= atafb.o obj-$(CONFIG_FB_MAC) += macfb.o +obj-$(CONFIG_FB_HECUBA) += hecubafb.o obj-$(CONFIG_FB_HGA) += hgafb.o obj-$(CONFIG_FB_IGA) += igafb.o obj-$(CONFIG_FB_APOLLO) += dnfb.o diff --git a/drivers/video/hecubafb.c b/drivers/video/hecubafb.c new file mode 100644 index 000..4740b92 --- /dev/null +++ b/drivers/video/hecubafb.c @@ -0,0 +1,568 @@ +/* + * linux/drivers/video/hecubafb.c -- FB driver for Hecuba controller + * + * Copyright (C) 2006, Jaya Kumar + * This work was sponsored by CIS(M) Sdn Bhd + * + * This file is subject to the terms and conditions of the GNU General Public + * License. See the file COPYING in the main directory of this archive for + * more details. + * + * Layout is based on skeletonfb.c by James Simmons and Geert Uytterhoeven. + * This work was possible because of apollo display code from E-Ink's website + * http://support.eink.com/community + * All information used to write this code is from public material made + * available by E-Ink on its support site. Some commands such as 0xA4 + * were found by looping through cmd=0x00 thru 0xFF and supplying random + * values. There are other commands that the display is capable of, + * beyond the 5 used here but they are more complex. + * + * This driver is written to be used with the Hecuba display controller + * board, and tested with the EInk 800x600 display in 1 bit mode. + * The interface between Hecuba and the host is TTL based GPIO. The + * GPIO requirements are 8 writable data lines and 6 lines for control. + * Only 4 of the controls are actually used here but 6 for future use. + * The driver requires the IO addresses for data and control GPIO at + * load time. It is also possible to use this display with a standard + * PC parallel port. + * + * General notes: + * - User must set hecubafb_enable=1 to enable it + * - User must set dio_addr=0xIOADDR cio_addr=0xIOADDR c2io_addr=0xIOADDR + * + */ + +#include asm/uaccess.h +#include linux/module.h +#include linux/kernel.h +#include linux/errno.h +#include linux/string.h +#include linux/mm.h +#include linux/slab.h +#include linux/vmalloc.h +#include linux/delay.h +#include linux/interrupt.h +#include linux/fb.h +#include linux/init.h +#include linux/platform_device.h +#include linux/list.h + +/* to support deferred IO */ +#include linux/rmap.h +#include linux/pagemap.h + +/* Apollo controller specific defines */ +#define APOLLO_START_NEW_IMG 0xA0 +#define APOLLO_STOP_IMG_DATA 0xA1 +#define APOLLO_DISPLAY_IMG 0xA2 +#define APOLLO_ERASE_DISPLAY 0xA3 +#define APOLLO_INIT_DISPLAY0xA4 + +/* Hecuba interface specific defines */ +/* WUP is inverted, CD is inverted, DS is inverted */ +#define HCB_NWUP_BIT 0x01 +#define HCB_NDS_BIT0x02 +#define HCB_RW_BIT 0x04 +#define HCB_NCD_BIT0x08 +#define HCB_ACK_BIT0x80 + +/* Display specific information */ +#define DPY_W 600 +#define DPY_H 800 + +struct hecubafb_par { + struct delayed_work deferred_work; + unsigned long dio_addr; + unsigned long cio_addr; + unsigned long c2io_addr; + unsigned char ctl; + atomic_t ref_count; + atomic_t vma_count; + struct fb_info *info; + unsigned int irq; + spinlock_t lock; + struct list_head pagelist; +}; + +struct page_list { + struct
Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver
On Thu, 11 Jan 2007 15:24:27 +0100 Jaya Kumar [EMAIL PROTECTED] wrote: +/* this is to find and return the vmalloc-ed fb pages */ +static struct page* hecubafb_vm_nopage(struct vm_area_struct *vma, + unsigned long vaddr, int *type) +{ + unsigned long offset; + struct page *page; + struct fb_info *info = vma-vm_private_data; + + offset = (vaddr - vma-vm_start) + (vma-vm_pgoff PAGE_SHIFT); + if (offset = (DPY_W*DPY_H)/8) + return NOPAGE_SIGBUS; + + page = vmalloc_to_page(info-screen_base + offset); + if (!page) + return NOPAGE_OOM; + + get_page(page); + if (type) + *type = VM_FAULT_MINOR; + return page; +} + +static void hecubafb_work(struct work_struct *work) +{ + struct hecubafb_par *par = container_of(work, struct hecubafb_par, + deferred_work.work); + struct list_head *node, *next; + struct page_list *cur; + + /* here we unmap the pages, then do all deferred IO */ + spin_lock(par-lock); + list_for_each_safe(node, next, par-pagelist) { + cur = list_entry(node, struct page_list, list); + list_del(node); + lock_page_nosync(cur-page); + page_mkclean(cur-page); + unlock_page(cur-page); + kfree(cur); + } + spin_unlock(par-lock); + hecubafb_dpy_update(par); +} + +static int hecubafb_page_mkwrite(struct vm_area_struct *vma, + struct page *page) +{ + struct fb_info *info = vma-vm_private_data; + struct hecubafb_par *par = info-par; + struct page_list *new; + + /* this is a callback we get when userspace first tries to + write to the page. we schedule a workqueue. that workqueue + will eventually unmap the touched pages and execute the + deferred framebuffer IO. then if userspace touches a page + again, we repeat the same scheme */ + + new = kzalloc(sizeof(struct page_list), GFP_KERNEL); + if (!new) + return -ENOMEM; + new-page = page; + + /* protect against the workqueue changing the page list */ + spin_lock(par-lock); + list_add(new-list, par-pagelist); + spin_unlock(par-lock); + + /* come back in 1s to process the deferred IO */ + schedule_delayed_work(par-deferred_work, HZ); + return 0; +} That's all very interesting. Please don't dump a bunch of new implementation concepts like this on us with no description of what it does, why it does it and why it does it in this particular manner. What is the theory of operation here? Presumably this is a performance optimisation to permit batching of the copying from user memory into the frambuffer card? If so, how much performance does it gain? I expect the benefit will be large, and could be increased if you were to add a small delay between first-touch and writeback to the display. Let's talk about that a bit. Is the optimisation applicable to other drivers? If so, should it be generalised into library code somewhere? I guess the export of page_mkclean() makes sense for this application. The use of lock_page_nosync() is wrong. It can still sleep, and here it's inside spinlock. And we don't want to export __lock_page_nosync() to modules. I suggest you convert the list locking here to a mutex and use lock_page(). - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver
On 1/11/07, Andrew Morton [EMAIL PROTECTED] wrote: That's all very interesting. Please don't dump a bunch of new implementation concepts like this on us with no description of what it does, why it does it and why it does it in this particular manner. Hi Andrew, Actually, I didn't dump without description. :-) I had posted an RFC and an explanation of the design to the lists. Here's an archive link to that post. http://marc.theaimsgroup.com/?l=linux-kernelm=116583546411423w=2 I wasn't sure whether to include that description with the patch email because it was long. From that email: --- This is there in order to hide the latency associated with updating the display (500ms to 800ms). The method used is to fake a framebuffer in memory. Then use pagefaults followed by delayed unmaping and only then do the actual framebuffer update. To explain this better, the usage scenario is like this: - userspace app like Xfbdev mmaps framebuffer - driver handles and sets up nopage and page_mkwrite handlers - app tries to write to mmaped vaddress - get pagefault and reaches driver's nopage handler - driver's nopage handler finds and returns physical page ( no actual framebuffer ) - write so get page_mkwrite where we add this page to a list - also schedules a workqueue task to be run after a delay - app continues writing to that page with no additional cost - the workqueue task comes in and unmaps the pages on the list, then completes the work associated with updating the framebuffer - app tries to write to the address (that has now been unmapped) - get pagefault and the above sequence occurs again The desire is roughly to allow bursty framebuffer writes to occur. Then after some time when hopefully things have gone quiet, we go and really update the framebuffer. For this type of nonvolatile high latency display, the desired image is the final image rather than intermediate stages which is why it's okay to not update for each write that is occuring. --- What is the theory of operation here? Presumably this is a performance optimisation to permit batching of the copying from user memory into the frambuffer card? If so, how much performance does it gain? Yes, you are right. Updating the E-Ink display currently requires about 500ms - 800ms. It is a non-volatile display and as such it is typically used in a manner where only the final image is important. As a result, being able to avoid the bursts of IO associated with screen activity and only write the final result is attractive. I have not done any performance benchmarks. I'm not sure exactly what to compare. I imagine in one case would be using write() to deliver the image updates and the other case would be mmap(), memcpy(). The latter would win because it's hiding all the intermediate writes. I expect the benefit will be large, and could be increased if you were to add a small delay between first-touch and writeback to the display. Let's talk about that a bit. Agreed. Though I may be misunderstanding what you mean by first-touch. Currently, I do a schedule_delayed_work and leave 1s between when the page_mkwrite callback indicating the first touch is received and when the deferred IO is processed to actually deliver the data to the display. I picked 1s because it rounds up the display latency. I imagine increasing the delay further may make it miss some desirable display activity. For example, a slider indicating progress of music may be slower than optimal. Perhaps I should make the delay a module parameter and leave the choice to the user? Is the optimisation applicable to other drivers? If so, should it be generalised into library code somewhere? I think the deferred IO code would be useful to devices that have slow updates and where only the final result is important. So far, this is the only device I've encountered that has this characteristic. I guess the export of page_mkclean() makes sense for this application. The use of lock_page_nosync() is wrong. It can still sleep, and here it's inside spinlock. And we don't want to export __lock_page_nosync() to modules. I suggest you convert the list locking here to a mutex and use lock_page(). Oops, sorry about that. I will correct it. Thanks, jayakumar - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver
On Thu, 11 Jan 2007 19:22:45 -0500 Jaya Kumar [EMAIL PROTECTED] wrote: On 1/11/07, Andrew Morton [EMAIL PROTECTED] wrote: That's all very interesting. Please don't dump a bunch of new implementation concepts like this on us with no description of what it does, why it does it and why it does it in this particular manner. Hi Andrew, Actually, I didn't dump without description. :-) I had posted an RFC and an explanation of the design to the lists. Here's an archive link to that post. http://marc.theaimsgroup.com/?l=linux-kernelm=116583546411423w=2 I wasn't sure whether to include that description with the patch email because it was long. Yes, please always include the full description as an integral part of the patch. In fact, it's very often best to communicate this information permanently, via code comments. From that email: --- This is there in order to hide the latency associated with updating the display (500ms to 800ms). The method used is to fake a framebuffer in memory. Then use pagefaults followed by delayed unmaping and only then do the actual framebuffer update. To explain this better, the usage scenario is like this: - userspace app like Xfbdev mmaps framebuffer - driver handles and sets up nopage and page_mkwrite handlers - app tries to write to mmaped vaddress - get pagefault and reaches driver's nopage handler - driver's nopage handler finds and returns physical page ( no actual framebuffer ) - write so get page_mkwrite where we add this page to a list - also schedules a workqueue task to be run after a delay - app continues writing to that page with no additional cost - the workqueue task comes in and unmaps the pages on the list, then completes the work associated with updating the framebuffer - app tries to write to the address (that has now been unmapped) - get pagefault and the above sequence occurs again The desire is roughly to allow bursty framebuffer writes to occur. Then after some time when hopefully things have gone quiet, we go and really update the framebuffer. For this type of nonvolatile high latency display, the desired image is the final image rather than intermediate stages which is why it's okay to not update for each write that is occuring. OK, makes sense. The whole idea is neat. What is the theory of operation here? Presumably this is a performance optimisation to permit batching of the copying from user memory into the frambuffer card? If so, how much performance does it gain? Yes, you are right. Updating the E-Ink display currently requires about 500ms - 800ms. It is a non-volatile display and as such it is typically used in a manner where only the final image is important. As a result, being able to avoid the bursts of IO associated with screen activity and only write the final result is attractive. I have not done any performance benchmarks. I'm not sure exactly what to compare. I imagine in one case would be using write() to deliver the image updates and the other case would be mmap(), memcpy(). The latter would win because it's hiding all the intermediate writes. I expect the benefit will be large, and could be increased if you were to add a small delay between first-touch and writeback to the display. Let's talk about that a bit. Agreed. Though I may be misunderstanding what you mean by first-touch. First modification - when the page goes from clean to dirty (and page_mkwrite gets called) Currently, I do a schedule_delayed_work and leave 1s between when the page_mkwrite callback indicating the first touch is received and when the deferred IO is processed to actually deliver the data to the display. oh, doh - I missed the fact that you're already adding a delay. I picked 1s because it rounds up the display latency. I imagine increasing the delay further may make it miss some desirable display activity. For example, a slider indicating progress of music may be slower than optimal. Perhaps I should make the delay a module parameter and leave the choice to the user? Don't know - your call. It would be interesting to know if this trick is applicable to any other framebuffer drivers. Is the optimisation applicable to other drivers? If so, should it be generalised into library code somewhere? I think the deferred IO code would be useful to devices that have slow updates and where only the final result is important. So far, this is the only device I've encountered that has this characteristic. OK. I guess the export of page_mkclean() makes sense for this application. The use of lock_page_nosync() is wrong. It can still sleep, and here it's inside spinlock. And we don't want to export __lock_page_nosync() to modules. I suggest you convert the list locking here to a mutex and use lock_page(). Oops, sorry about that. I will correct it. Thanks. Consider adding a nice long Overview of operation comment in there too. -
Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver
Jaya Kumar wrote: On 1/11/07, Andrew Morton [EMAIL PROTECTED] wrote: That's all very interesting. Please don't dump a bunch of new implementation concepts like this on us with no description of what it does, why it does it and why it does it in this particular manner. Hi Andrew, Actually, I didn't dump without description. :-) I had posted an RFC and an explanation of the design to the lists. Here's an archive link to that post. http://marc.theaimsgroup.com/?l=linux-kernelm=116583546411423w=2 I wasn't sure whether to include that description with the patch email because it was long. From that email: --- This is there in order to hide the latency associated with updating the display (500ms to 800ms). The method used is to fake a framebuffer in memory. Then use pagefaults followed by delayed unmaping and only then do the actual framebuffer update. To explain this better, the usage scenario is like this: - userspace app like Xfbdev mmaps framebuffer - driver handles and sets up nopage and page_mkwrite handlers - app tries to write to mmaped vaddress - get pagefault and reaches driver's nopage handler - driver's nopage handler finds and returns physical page ( no actual framebuffer ) - write so get page_mkwrite where we add this page to a list - also schedules a workqueue task to be run after a delay - app continues writing to that page with no additional cost - the workqueue task comes in and unmaps the pages on the list, then completes the work associated with updating the framebuffer Have you thought about implementing a traditional write-back cache using the dirty bits, rather than unmapping the page? -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver
On Thu, 2007-01-11 at 19:22 -0500, Jaya Kumar wrote: Agreed. Though I may be misunderstanding what you mean by first-touch. Currently, I do a schedule_delayed_work and leave 1s between when the page_mkwrite callback indicating the first touch is received and when the deferred IO is processed to actually deliver the data to the display. I picked 1s because it rounds up the display latency. I imagine increasing the delay further may make it miss some desirable display activity. For example, a slider indicating progress of music may be slower than optimal. Perhaps I should make the delay a module parameter and leave the choice to the user? How about implementing the sync_page() aop? Then you could force the flush using msync(MS_SYNC). Hmm... that might require more surgery but the idea would work I think. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver
On Fri, 12 Jan 2007 08:15:45 +0100 Peter Zijlstra [EMAIL PROTECTED] wrote: How about implementing the sync_page() aop? That got deleted in Jens's tree - the unplugging rework. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/