Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver

2007-02-17 Thread Jaya Kumar

On 1/12/07, Nick Piggin <[EMAIL PROTECTED]> wrote:

Jaya Kumar wrote:
> - write so get page_mkwrite where we add this page to a list
> - also schedules a workqueue task to be run after a delay
> - app continues writing to that page with no additional cost
> - the workqueue task comes in and unmaps the pages on the list, then
>  completes the work associated with updating the framebuffer

Have you thought about implementing a traditional write-back cache using
the dirty bits, rather than unmapping the page?



Ah, sorry, I erred in my description. I'm not unmapping pages, I'm
calling page_mkclean which uses the dirty bits.

Thanks,
jaya
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver

2007-02-17 Thread Jaya Kumar

On 1/12/07, Nick Piggin [EMAIL PROTECTED] wrote:

Jaya Kumar wrote:
 - write so get page_mkwrite where we add this page to a list
 - also schedules a workqueue task to be run after a delay
 - app continues writing to that page with no additional cost
 - the workqueue task comes in and unmaps the pages on the list, then
  completes the work associated with updating the framebuffer

Have you thought about implementing a traditional write-back cache using
the dirty bits, rather than unmapping the page?



Ah, sorry, I erred in my description. I'm not unmapping pages, I'm
calling page_mkclean which uses the dirty bits.

Thanks,
jaya
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver

2007-01-11 Thread Andrew Morton
On Fri, 12 Jan 2007 08:15:45 +0100
Peter Zijlstra <[EMAIL PROTECTED]> wrote:

> How about implementing the sync_page() aop?

That got deleted in Jens's tree - the unplugging rework.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver

2007-01-11 Thread Peter Zijlstra
On Thu, 2007-01-11 at 19:22 -0500, Jaya Kumar wrote:

> Agreed. Though I may be misunderstanding what you mean by first-touch.
> Currently, I do a schedule_delayed_work and leave 1s between when the
> page_mkwrite callback indicating the first touch is received and when
> the deferred IO is processed to actually deliver the data to the
> display. I picked 1s because it rounds up the display latency. I
> imagine increasing the delay further may make it miss some desirable
> display activity. For example, a slider indicating progress of music
> may be slower than optimal. Perhaps I should make the delay a module
> parameter and leave the choice to the user?

How about implementing the sync_page() aop? Then you could force the
flush using msync(MS_SYNC). 

Hmm... that might require more surgery but the idea would work I think.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver

2007-01-11 Thread Nick Piggin

Jaya Kumar wrote:

On 1/11/07, Andrew Morton <[EMAIL PROTECTED]> wrote:


That's all very interesting.

Please don't dump a bunch of new implementation concepts like this on us
with no description of what it does, why it does it and why it does it in
this particular manner.



Hi Andrew,

Actually, I didn't dump without description. :-) I had posted an RFC
and an explanation of the design to the lists. Here's an archive link
to that post. 
http://marc.theaimsgroup.com/?l=linux-kernel=116583546411423=2

I wasn't sure whether to include that description with the patch email
because it was long.


From that email:


---
This is there in order to hide the latency
associated with updating the display (500ms to 800ms). The method used
is to fake a framebuffer in memory. Then use pagefaults followed by delayed
unmaping and only then do the actual framebuffer update. To explain this
better, the usage scenario is like this:

- userspace app like Xfbdev mmaps framebuffer
- driver handles and sets up nopage and page_mkwrite handlers
- app tries to write to mmaped vaddress
- get pagefault and reaches driver's nopage handler
- driver's nopage handler finds and returns physical page ( no
 actual framebuffer )
- write so get page_mkwrite where we add this page to a list
- also schedules a workqueue task to be run after a delay
- app continues writing to that page with no additional cost
- the workqueue task comes in and unmaps the pages on the list, then
 completes the work associated with updating the framebuffer


Have you thought about implementing a traditional write-back cache using
the dirty bits, rather than unmapping the page?

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver

2007-01-11 Thread Andrew Morton
On Thu, 11 Jan 2007 19:22:45 -0500
"Jaya Kumar" <[EMAIL PROTECTED]> wrote:

> On 1/11/07, Andrew Morton <[EMAIL PROTECTED]> wrote:
> > That's all very interesting.
> >
> > Please don't dump a bunch of new implementation concepts like this on us
> > with no description of what it does, why it does it and why it does it in
> > this particular manner.
> 
> Hi Andrew,
> 
> Actually, I didn't dump without description. :-) I had posted an RFC
> and an explanation of the design to the lists. Here's an archive link
> to that post. 
> http://marc.theaimsgroup.com/?l=linux-kernel=116583546411423=2
> I wasn't sure whether to include that description with the patch email
> because it was long.

Yes, please always include the full description as an integral part of the
patch.  In fact, it's very often best to communicate this information
permanently, via code comments.

> >From that email:
> ---
> This is there in order to hide the latency
> associated with updating the display (500ms to 800ms). The method used
> is to fake a framebuffer in memory. Then use pagefaults followed by delayed
> unmaping and only then do the actual framebuffer update. To explain this
> better, the usage scenario is like this:
> 
> - userspace app like Xfbdev mmaps framebuffer
> - driver handles and sets up nopage and page_mkwrite handlers
> - app tries to write to mmaped vaddress
> - get pagefault and reaches driver's nopage handler
> - driver's nopage handler finds and returns physical page ( no
>   actual framebuffer )
> - write so get page_mkwrite where we add this page to a list
> - also schedules a workqueue task to be run after a delay
> - app continues writing to that page with no additional cost
> - the workqueue task comes in and unmaps the pages on the list, then
>   completes the work associated with updating the framebuffer
> - app tries to write to the address (that has now been unmapped)
> - get pagefault and the above sequence occurs again
> 
> The desire is roughly to allow bursty framebuffer writes to occur.
> Then after some time when hopefully things have gone quiet, we go and
> really update the framebuffer. For this type of nonvolatile high latency
> display, the desired image is the final image rather than intermediate
> stages which is why it's okay to not update for each write that is
> occuring.

OK, makes sense.  The whole idea is neat.

> 
> >
> > What is the "theory of operation" here?
> >
> > Presumably this is a performance optimisation to permit batching of the
> > copying from user memory into the frambuffer card?  If so, how much
> > performance does it gain?
> 
> Yes, you are right. Updating the E-Ink display currently requires
> about 500ms - 800ms. It is a non-volatile display and as such it is
> typically used in a manner where only the final image is important. As
> a result, being able to avoid the bursts of IO associated with screen
> activity and only write the final result is attractive.
> 
> I have not done any performance benchmarks. I'm not sure exactly what
> to compare. I imagine in one case would be using write() to deliver
> the image updates and the other case would be mmap(), memcpy(). The
> latter would win because it's hiding all the intermediate "writes".
> 
> >
> > I expect the benefit will be large, and could be increased if you were to
> > add a small delay between first-touch and writeback to the display.  Let's
> > talk about that a bit.
> 
> Agreed. Though I may be misunderstanding what you mean by first-touch.

First modification - when the page goes from clean to dirty (and
page_mkwrite gets called)

> Currently, I do a schedule_delayed_work and leave 1s between when the
> page_mkwrite callback indicating the first touch is received and when
> the deferred IO is processed to actually deliver the data to the
> display.

oh, doh - I missed the fact that you're already adding a delay.

> I picked 1s because it rounds up the display latency. I
> imagine increasing the delay further may make it miss some desirable
> display activity. For example, a slider indicating progress of music
> may be slower than optimal. Perhaps I should make the delay a module
> parameter and leave the choice to the user?

Don't know - your call.

It would be interesting to know if this trick is applicable to any other
framebuffer drivers.

> >
> > Is the optimisation applicable to other drivers?  If so, should it be
> > generalised into library code somewhere?
> 
> I think the deferred IO code would be useful to devices that have slow
> updates and where only the final result is important. So far, this is
> the only device I've encountered that has this characteristic.

OK.

> >
> > I guess the export of page_mkclean() makes sense for this application.
> >
> > The use of lock_page_nosync() is wrong.  It can still sleep, and here it's
> > inside spinlock.  And we don't want to export __lock_page_nosync() to
> > modules.  I suggest you convert the list locking here to a mutex and use
> > lock_page().
> >
> 
> 

Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver

2007-01-11 Thread Jaya Kumar

On 1/11/07, Andrew Morton <[EMAIL PROTECTED]> wrote:

That's all very interesting.

Please don't dump a bunch of new implementation concepts like this on us
with no description of what it does, why it does it and why it does it in
this particular manner.


Hi Andrew,

Actually, I didn't dump without description. :-) I had posted an RFC
and an explanation of the design to the lists. Here's an archive link
to that post. http://marc.theaimsgroup.com/?l=linux-kernel=116583546411423=2
I wasn't sure whether to include that description with the patch email
because it was long.


From that email:

---
This is there in order to hide the latency
associated with updating the display (500ms to 800ms). The method used
is to fake a framebuffer in memory. Then use pagefaults followed by delayed
unmaping and only then do the actual framebuffer update. To explain this
better, the usage scenario is like this:

- userspace app like Xfbdev mmaps framebuffer
- driver handles and sets up nopage and page_mkwrite handlers
- app tries to write to mmaped vaddress
- get pagefault and reaches driver's nopage handler
- driver's nopage handler finds and returns physical page ( no
 actual framebuffer )
- write so get page_mkwrite where we add this page to a list
- also schedules a workqueue task to be run after a delay
- app continues writing to that page with no additional cost
- the workqueue task comes in and unmaps the pages on the list, then
 completes the work associated with updating the framebuffer
- app tries to write to the address (that has now been unmapped)
- get pagefault and the above sequence occurs again

The desire is roughly to allow bursty framebuffer writes to occur.
Then after some time when hopefully things have gone quiet, we go and
really update the framebuffer. For this type of nonvolatile high latency
display, the desired image is the final image rather than intermediate
stages which is why it's okay to not update for each write that is
occuring.
---



What is the "theory of operation" here?

Presumably this is a performance optimisation to permit batching of the
copying from user memory into the frambuffer card?  If so, how much
performance does it gain?


Yes, you are right. Updating the E-Ink display currently requires
about 500ms - 800ms. It is a non-volatile display and as such it is
typically used in a manner where only the final image is important. As
a result, being able to avoid the bursts of IO associated with screen
activity and only write the final result is attractive.

I have not done any performance benchmarks. I'm not sure exactly what
to compare. I imagine in one case would be using write() to deliver
the image updates and the other case would be mmap(), memcpy(). The
latter would win because it's hiding all the intermediate "writes".



I expect the benefit will be large, and could be increased if you were to
add a small delay between first-touch and writeback to the display.  Let's
talk about that a bit.


Agreed. Though I may be misunderstanding what you mean by first-touch.
Currently, I do a schedule_delayed_work and leave 1s between when the
page_mkwrite callback indicating the first touch is received and when
the deferred IO is processed to actually deliver the data to the
display. I picked 1s because it rounds up the display latency. I
imagine increasing the delay further may make it miss some desirable
display activity. For example, a slider indicating progress of music
may be slower than optimal. Perhaps I should make the delay a module
parameter and leave the choice to the user?



Is the optimisation applicable to other drivers?  If so, should it be
generalised into library code somewhere?


I think the deferred IO code would be useful to devices that have slow
updates and where only the final result is important. So far, this is
the only device I've encountered that has this characteristic.



I guess the export of page_mkclean() makes sense for this application.

The use of lock_page_nosync() is wrong.  It can still sleep, and here it's
inside spinlock.  And we don't want to export __lock_page_nosync() to
modules.  I suggest you convert the list locking here to a mutex and use
lock_page().



Oops, sorry about that. I will correct it.

Thanks,
jayakumar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver

2007-01-11 Thread Andrew Morton
On Thu, 11 Jan 2007 15:24:27 +0100
Jaya Kumar <[EMAIL PROTECTED]> wrote:

> +/* this is to find and return the vmalloc-ed fb pages */
> +static struct page* hecubafb_vm_nopage(struct vm_area_struct *vma, 
> + unsigned long vaddr, int *type)
> +{
> + unsigned long offset;
> + struct page *page;
> + struct fb_info *info = vma->vm_private_data;
> +
> + offset = (vaddr - vma->vm_start) + (vma->vm_pgoff << PAGE_SHIFT);
> + if (offset >= (DPY_W*DPY_H)/8)
> + return NOPAGE_SIGBUS;
> +
> + page = vmalloc_to_page(info->screen_base + offset);
> + if (!page)
> + return NOPAGE_OOM;
> +
> + get_page(page);
> + if (type)
> + *type = VM_FAULT_MINOR;
> + return page;
> +}
> +
> +static void hecubafb_work(struct work_struct *work)
> +{
> + struct hecubafb_par *par = container_of(work, struct hecubafb_par,
> + deferred_work.work);
> + struct list_head *node, *next;
> + struct page_list *cur;
> +
> + /* here we unmap the pages, then do all deferred IO */
> + spin_lock(>lock);
> + list_for_each_safe(node, next, >pagelist) {
> + cur = list_entry(node, struct page_list, list);
> + list_del(node);
> + lock_page_nosync(cur->page);
> + page_mkclean(cur->page);
> + unlock_page(cur->page);
> + kfree(cur);
> + }
> + spin_unlock(>lock);
> + hecubafb_dpy_update(par);
> +}
> +
> +static int hecubafb_page_mkwrite(struct vm_area_struct *vma, 
> + struct page *page)
> +{
> + struct fb_info *info = vma->vm_private_data;
> + struct hecubafb_par *par = info->par;
> + struct page_list *new;
> +
> + /* this is a callback we get when userspace first tries to 
> + write to the page. we schedule a workqueue. that workqueue 
> + will eventually unmap the touched pages and execute the 
> + deferred framebuffer IO. then if userspace touches a page 
> + again, we repeat the same scheme */
> +
> + new = kzalloc(sizeof(struct page_list), GFP_KERNEL);
> + if (!new)
> + return -ENOMEM;
> + new->page = page;
> +
> + /* protect against the workqueue changing the page list */
> + spin_lock(>lock);
> + list_add(>list, >pagelist);
> + spin_unlock(>lock);
> +
> + /* come back in 1s to process the deferred IO */
> + schedule_delayed_work(>deferred_work, HZ);
> + return 0;
> +}

That's all very interesting.

Please don't dump a bunch of new implementation concepts like this on us
with no description of what it does, why it does it and why it does it in
this particular manner.

What is the "theory of operation" here?

Presumably this is a performance optimisation to permit batching of the
copying from user memory into the frambuffer card?  If so, how much
performance does it gain?

I expect the benefit will be large, and could be increased if you were to
add a small delay between first-touch and writeback to the display.  Let's
talk about that a bit.

Is the optimisation applicable to other drivers?  If so, should it be
generalised into library code somewhere?

I guess the export of page_mkclean() makes sense for this application.

The use of lock_page_nosync() is wrong.  It can still sleep, and here it's
inside spinlock.  And we don't want to export __lock_page_nosync() to
modules.  I suggest you convert the list locking here to a mutex and use
lock_page().



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver

2007-01-11 Thread Jaya Kumar
This patch adds support for the Hecuba/E-Ink display with deferred IO.
I welcome your feedback and advice.

Signed-off-by: Jaya Kumar <[EMAIL PROTECTED]>

---

 drivers/video/Kconfig|   13 +
 drivers/video/Makefile   |1 
 drivers/video/hecubafb.c |  568 +++
 mm/filemap.c |1 
 mm/rmap.c|1 
 5 files changed, 584 insertions(+)

---

diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
index 4e83f01..cf2dc50 100644
--- a/drivers/video/Kconfig
+++ b/drivers/video/Kconfig
@@ -568,6 +568,19 @@ config FB_IMAC
help
  This is the frame buffer device driver for the Intel-based Macintosh
 
+config FB_HECUBA
+   tristate "Hecuba board support"
+   depends on FB && X86 && MMU
+   select FB_CFB_FILLRECT
+   select FB_CFB_COPYAREA
+   select FB_CFB_IMAGEBLIT
+   help
+ This enables support for the Hecuba board. This driver was tested 
+ with an E-Ink 800x600 display and x86 SBCs through a 16 bit GPIO
+ interface (8 bit data, 4 bit control). If you anticpate using
+ this driver, say Y or M; otherwise say N. You must specify the
+ GPIO IO address to be used for setting control and data.
+
 config FB_HGA
tristate "Hercules mono graphics support"
depends on FB && X86
diff --git a/drivers/video/Makefile b/drivers/video/Makefile
index 309a26d..b4d5655 100644
--- a/drivers/video/Makefile
+++ b/drivers/video/Makefile
@@ -67,6 +67,7 @@ obj-$(CONFIG_FB_SGIVW)+= sgivwfb.o
 obj-$(CONFIG_FB_ACORN)+= acornfb.o
 obj-$(CONFIG_FB_ATARI)+= atafb.o
 obj-$(CONFIG_FB_MAC)  += macfb.o
+obj-$(CONFIG_FB_HECUBA)   += hecubafb.o
 obj-$(CONFIG_FB_HGA)  += hgafb.o
 obj-$(CONFIG_FB_IGA)  += igafb.o
 obj-$(CONFIG_FB_APOLLO)   += dnfb.o
diff --git a/drivers/video/hecubafb.c b/drivers/video/hecubafb.c
new file mode 100644
index 000..4740b92
--- /dev/null
+++ b/drivers/video/hecubafb.c
@@ -0,0 +1,568 @@
+/*
+ * linux/drivers/video/hecubafb.c -- FB driver for Hecuba controller
+ *
+ * Copyright (C) 2006, Jaya Kumar 
+ * This work was sponsored by CIS(M) Sdn Bhd
+ *
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License. See the file COPYING in the main directory of this archive for
+ * more details.
+ *
+ * Layout is based on skeletonfb.c by James Simmons and Geert Uytterhoeven.
+ * This work was possible because of apollo display code from E-Ink's website
+ * http://support.eink.com/community
+ * All information used to write this code is from public material made
+ * available by E-Ink on its support site. Some commands such as 0xA4
+ * were found by looping through cmd=0x00 thru 0xFF and supplying random
+ * values. There are other commands that the display is capable of,
+ * beyond the 5 used here but they are more complex. 
+ *
+ * This driver is written to be used with the Hecuba display controller
+ * board, and tested with the EInk 800x600 display in 1 bit mode. 
+ * The interface between Hecuba and the host is TTL based GPIO. The
+ * GPIO requirements are 8 writable data lines and 6 lines for control.
+ * Only 4 of the controls are actually used here but 6 for future use.
+ * The driver requires the IO addresses for data and control GPIO at 
+ * load time. It is also possible to use this display with a standard 
+ * PC parallel port. 
+ *
+ * General notes:
+ * - User must set hecubafb_enable=1 to enable it
+ * - User must set dio_addr=0xIOADDR cio_addr=0xIOADDR c2io_addr=0xIOADDR
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* to support deferred IO */
+#include 
+#include 
+
+/* Apollo controller specific defines */
+#define APOLLO_START_NEW_IMG   0xA0
+#define APOLLO_STOP_IMG_DATA   0xA1
+#define APOLLO_DISPLAY_IMG 0xA2
+#define APOLLO_ERASE_DISPLAY   0xA3
+#define APOLLO_INIT_DISPLAY0xA4
+
+/* Hecuba interface specific defines */
+/* WUP is inverted, CD is inverted, DS is inverted */
+#define HCB_NWUP_BIT   0x01
+#define HCB_NDS_BIT0x02
+#define HCB_RW_BIT 0x04
+#define HCB_NCD_BIT0x08
+#define HCB_ACK_BIT0x80
+
+/* Display specific information */
+#define DPY_W 600
+#define DPY_H 800
+
+struct hecubafb_par {
+   struct delayed_work deferred_work;
+   unsigned long dio_addr;
+   unsigned long cio_addr;
+   unsigned long c2io_addr;
+   unsigned char ctl;
+   atomic_t ref_count;
+   atomic_t vma_count;
+   struct fb_info *info;
+   unsigned int irq;
+   spinlock_t lock;
+   struct list_head pagelist;
+};
+
+struct page_list {
+   struct list_head list;
+   struct page *page;
+}; 
+
+static struct fb_fix_screeninfo hecubafb_fix __initdata = {
+   .id =   "hecubafb",
+   .type = FB_TYPE_PACKED_PIXELS,
+   

[PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver

2007-01-11 Thread Jaya Kumar
This patch adds support for the Hecuba/E-Ink display with deferred IO.
I welcome your feedback and advice.

Signed-off-by: Jaya Kumar [EMAIL PROTECTED]

---

 drivers/video/Kconfig|   13 +
 drivers/video/Makefile   |1 
 drivers/video/hecubafb.c |  568 +++
 mm/filemap.c |1 
 mm/rmap.c|1 
 5 files changed, 584 insertions(+)

---

diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
index 4e83f01..cf2dc50 100644
--- a/drivers/video/Kconfig
+++ b/drivers/video/Kconfig
@@ -568,6 +568,19 @@ config FB_IMAC
help
  This is the frame buffer device driver for the Intel-based Macintosh
 
+config FB_HECUBA
+   tristate Hecuba board support
+   depends on FB  X86  MMU
+   select FB_CFB_FILLRECT
+   select FB_CFB_COPYAREA
+   select FB_CFB_IMAGEBLIT
+   help
+ This enables support for the Hecuba board. This driver was tested 
+ with an E-Ink 800x600 display and x86 SBCs through a 16 bit GPIO
+ interface (8 bit data, 4 bit control). If you anticpate using
+ this driver, say Y or M; otherwise say N. You must specify the
+ GPIO IO address to be used for setting control and data.
+
 config FB_HGA
tristate Hercules mono graphics support
depends on FB  X86
diff --git a/drivers/video/Makefile b/drivers/video/Makefile
index 309a26d..b4d5655 100644
--- a/drivers/video/Makefile
+++ b/drivers/video/Makefile
@@ -67,6 +67,7 @@ obj-$(CONFIG_FB_SGIVW)+= sgivwfb.o
 obj-$(CONFIG_FB_ACORN)+= acornfb.o
 obj-$(CONFIG_FB_ATARI)+= atafb.o
 obj-$(CONFIG_FB_MAC)  += macfb.o
+obj-$(CONFIG_FB_HECUBA)   += hecubafb.o
 obj-$(CONFIG_FB_HGA)  += hgafb.o
 obj-$(CONFIG_FB_IGA)  += igafb.o
 obj-$(CONFIG_FB_APOLLO)   += dnfb.o
diff --git a/drivers/video/hecubafb.c b/drivers/video/hecubafb.c
new file mode 100644
index 000..4740b92
--- /dev/null
+++ b/drivers/video/hecubafb.c
@@ -0,0 +1,568 @@
+/*
+ * linux/drivers/video/hecubafb.c -- FB driver for Hecuba controller
+ *
+ * Copyright (C) 2006, Jaya Kumar 
+ * This work was sponsored by CIS(M) Sdn Bhd
+ *
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License. See the file COPYING in the main directory of this archive for
+ * more details.
+ *
+ * Layout is based on skeletonfb.c by James Simmons and Geert Uytterhoeven.
+ * This work was possible because of apollo display code from E-Ink's website
+ * http://support.eink.com/community
+ * All information used to write this code is from public material made
+ * available by E-Ink on its support site. Some commands such as 0xA4
+ * were found by looping through cmd=0x00 thru 0xFF and supplying random
+ * values. There are other commands that the display is capable of,
+ * beyond the 5 used here but they are more complex. 
+ *
+ * This driver is written to be used with the Hecuba display controller
+ * board, and tested with the EInk 800x600 display in 1 bit mode. 
+ * The interface between Hecuba and the host is TTL based GPIO. The
+ * GPIO requirements are 8 writable data lines and 6 lines for control.
+ * Only 4 of the controls are actually used here but 6 for future use.
+ * The driver requires the IO addresses for data and control GPIO at 
+ * load time. It is also possible to use this display with a standard 
+ * PC parallel port. 
+ *
+ * General notes:
+ * - User must set hecubafb_enable=1 to enable it
+ * - User must set dio_addr=0xIOADDR cio_addr=0xIOADDR c2io_addr=0xIOADDR
+ *
+ */
+
+#include asm/uaccess.h
+#include linux/module.h
+#include linux/kernel.h
+#include linux/errno.h
+#include linux/string.h
+#include linux/mm.h
+#include linux/slab.h
+#include linux/vmalloc.h
+#include linux/delay.h
+#include linux/interrupt.h
+#include linux/fb.h
+#include linux/init.h
+#include linux/platform_device.h
+#include linux/list.h
+
+/* to support deferred IO */
+#include linux/rmap.h
+#include linux/pagemap.h
+
+/* Apollo controller specific defines */
+#define APOLLO_START_NEW_IMG   0xA0
+#define APOLLO_STOP_IMG_DATA   0xA1
+#define APOLLO_DISPLAY_IMG 0xA2
+#define APOLLO_ERASE_DISPLAY   0xA3
+#define APOLLO_INIT_DISPLAY0xA4
+
+/* Hecuba interface specific defines */
+/* WUP is inverted, CD is inverted, DS is inverted */
+#define HCB_NWUP_BIT   0x01
+#define HCB_NDS_BIT0x02
+#define HCB_RW_BIT 0x04
+#define HCB_NCD_BIT0x08
+#define HCB_ACK_BIT0x80
+
+/* Display specific information */
+#define DPY_W 600
+#define DPY_H 800
+
+struct hecubafb_par {
+   struct delayed_work deferred_work;
+   unsigned long dio_addr;
+   unsigned long cio_addr;
+   unsigned long c2io_addr;
+   unsigned char ctl;
+   atomic_t ref_count;
+   atomic_t vma_count;
+   struct fb_info *info;
+   unsigned int irq;
+   spinlock_t lock;
+   struct list_head pagelist;
+};
+
+struct page_list {
+   struct 

Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver

2007-01-11 Thread Andrew Morton
On Thu, 11 Jan 2007 15:24:27 +0100
Jaya Kumar [EMAIL PROTECTED] wrote:

 +/* this is to find and return the vmalloc-ed fb pages */
 +static struct page* hecubafb_vm_nopage(struct vm_area_struct *vma, 
 + unsigned long vaddr, int *type)
 +{
 + unsigned long offset;
 + struct page *page;
 + struct fb_info *info = vma-vm_private_data;
 +
 + offset = (vaddr - vma-vm_start) + (vma-vm_pgoff  PAGE_SHIFT);
 + if (offset = (DPY_W*DPY_H)/8)
 + return NOPAGE_SIGBUS;
 +
 + page = vmalloc_to_page(info-screen_base + offset);
 + if (!page)
 + return NOPAGE_OOM;
 +
 + get_page(page);
 + if (type)
 + *type = VM_FAULT_MINOR;
 + return page;
 +}
 +
 +static void hecubafb_work(struct work_struct *work)
 +{
 + struct hecubafb_par *par = container_of(work, struct hecubafb_par,
 + deferred_work.work);
 + struct list_head *node, *next;
 + struct page_list *cur;
 +
 + /* here we unmap the pages, then do all deferred IO */
 + spin_lock(par-lock);
 + list_for_each_safe(node, next, par-pagelist) {
 + cur = list_entry(node, struct page_list, list);
 + list_del(node);
 + lock_page_nosync(cur-page);
 + page_mkclean(cur-page);
 + unlock_page(cur-page);
 + kfree(cur);
 + }
 + spin_unlock(par-lock);
 + hecubafb_dpy_update(par);
 +}
 +
 +static int hecubafb_page_mkwrite(struct vm_area_struct *vma, 
 + struct page *page)
 +{
 + struct fb_info *info = vma-vm_private_data;
 + struct hecubafb_par *par = info-par;
 + struct page_list *new;
 +
 + /* this is a callback we get when userspace first tries to 
 + write to the page. we schedule a workqueue. that workqueue 
 + will eventually unmap the touched pages and execute the 
 + deferred framebuffer IO. then if userspace touches a page 
 + again, we repeat the same scheme */
 +
 + new = kzalloc(sizeof(struct page_list), GFP_KERNEL);
 + if (!new)
 + return -ENOMEM;
 + new-page = page;
 +
 + /* protect against the workqueue changing the page list */
 + spin_lock(par-lock);
 + list_add(new-list, par-pagelist);
 + spin_unlock(par-lock);
 +
 + /* come back in 1s to process the deferred IO */
 + schedule_delayed_work(par-deferred_work, HZ);
 + return 0;
 +}

That's all very interesting.

Please don't dump a bunch of new implementation concepts like this on us
with no description of what it does, why it does it and why it does it in
this particular manner.

What is the theory of operation here?

Presumably this is a performance optimisation to permit batching of the
copying from user memory into the frambuffer card?  If so, how much
performance does it gain?

I expect the benefit will be large, and could be increased if you were to
add a small delay between first-touch and writeback to the display.  Let's
talk about that a bit.

Is the optimisation applicable to other drivers?  If so, should it be
generalised into library code somewhere?

I guess the export of page_mkclean() makes sense for this application.

The use of lock_page_nosync() is wrong.  It can still sleep, and here it's
inside spinlock.  And we don't want to export __lock_page_nosync() to
modules.  I suggest you convert the list locking here to a mutex and use
lock_page().



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver

2007-01-11 Thread Jaya Kumar

On 1/11/07, Andrew Morton [EMAIL PROTECTED] wrote:

That's all very interesting.

Please don't dump a bunch of new implementation concepts like this on us
with no description of what it does, why it does it and why it does it in
this particular manner.


Hi Andrew,

Actually, I didn't dump without description. :-) I had posted an RFC
and an explanation of the design to the lists. Here's an archive link
to that post. http://marc.theaimsgroup.com/?l=linux-kernelm=116583546411423w=2
I wasn't sure whether to include that description with the patch email
because it was long.


From that email:

---
This is there in order to hide the latency
associated with updating the display (500ms to 800ms). The method used
is to fake a framebuffer in memory. Then use pagefaults followed by delayed
unmaping and only then do the actual framebuffer update. To explain this
better, the usage scenario is like this:

- userspace app like Xfbdev mmaps framebuffer
- driver handles and sets up nopage and page_mkwrite handlers
- app tries to write to mmaped vaddress
- get pagefault and reaches driver's nopage handler
- driver's nopage handler finds and returns physical page ( no
 actual framebuffer )
- write so get page_mkwrite where we add this page to a list
- also schedules a workqueue task to be run after a delay
- app continues writing to that page with no additional cost
- the workqueue task comes in and unmaps the pages on the list, then
 completes the work associated with updating the framebuffer
- app tries to write to the address (that has now been unmapped)
- get pagefault and the above sequence occurs again

The desire is roughly to allow bursty framebuffer writes to occur.
Then after some time when hopefully things have gone quiet, we go and
really update the framebuffer. For this type of nonvolatile high latency
display, the desired image is the final image rather than intermediate
stages which is why it's okay to not update for each write that is
occuring.
---



What is the theory of operation here?

Presumably this is a performance optimisation to permit batching of the
copying from user memory into the frambuffer card?  If so, how much
performance does it gain?


Yes, you are right. Updating the E-Ink display currently requires
about 500ms - 800ms. It is a non-volatile display and as such it is
typically used in a manner where only the final image is important. As
a result, being able to avoid the bursts of IO associated with screen
activity and only write the final result is attractive.

I have not done any performance benchmarks. I'm not sure exactly what
to compare. I imagine in one case would be using write() to deliver
the image updates and the other case would be mmap(), memcpy(). The
latter would win because it's hiding all the intermediate writes.



I expect the benefit will be large, and could be increased if you were to
add a small delay between first-touch and writeback to the display.  Let's
talk about that a bit.


Agreed. Though I may be misunderstanding what you mean by first-touch.
Currently, I do a schedule_delayed_work and leave 1s between when the
page_mkwrite callback indicating the first touch is received and when
the deferred IO is processed to actually deliver the data to the
display. I picked 1s because it rounds up the display latency. I
imagine increasing the delay further may make it miss some desirable
display activity. For example, a slider indicating progress of music
may be slower than optimal. Perhaps I should make the delay a module
parameter and leave the choice to the user?



Is the optimisation applicable to other drivers?  If so, should it be
generalised into library code somewhere?


I think the deferred IO code would be useful to devices that have slow
updates and where only the final result is important. So far, this is
the only device I've encountered that has this characteristic.



I guess the export of page_mkclean() makes sense for this application.

The use of lock_page_nosync() is wrong.  It can still sleep, and here it's
inside spinlock.  And we don't want to export __lock_page_nosync() to
modules.  I suggest you convert the list locking here to a mutex and use
lock_page().



Oops, sorry about that. I will correct it.

Thanks,
jayakumar
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver

2007-01-11 Thread Andrew Morton
On Thu, 11 Jan 2007 19:22:45 -0500
Jaya Kumar [EMAIL PROTECTED] wrote:

 On 1/11/07, Andrew Morton [EMAIL PROTECTED] wrote:
  That's all very interesting.
 
  Please don't dump a bunch of new implementation concepts like this on us
  with no description of what it does, why it does it and why it does it in
  this particular manner.
 
 Hi Andrew,
 
 Actually, I didn't dump without description. :-) I had posted an RFC
 and an explanation of the design to the lists. Here's an archive link
 to that post. 
 http://marc.theaimsgroup.com/?l=linux-kernelm=116583546411423w=2
 I wasn't sure whether to include that description with the patch email
 because it was long.

Yes, please always include the full description as an integral part of the
patch.  In fact, it's very often best to communicate this information
permanently, via code comments.

 From that email:
 ---
 This is there in order to hide the latency
 associated with updating the display (500ms to 800ms). The method used
 is to fake a framebuffer in memory. Then use pagefaults followed by delayed
 unmaping and only then do the actual framebuffer update. To explain this
 better, the usage scenario is like this:
 
 - userspace app like Xfbdev mmaps framebuffer
 - driver handles and sets up nopage and page_mkwrite handlers
 - app tries to write to mmaped vaddress
 - get pagefault and reaches driver's nopage handler
 - driver's nopage handler finds and returns physical page ( no
   actual framebuffer )
 - write so get page_mkwrite where we add this page to a list
 - also schedules a workqueue task to be run after a delay
 - app continues writing to that page with no additional cost
 - the workqueue task comes in and unmaps the pages on the list, then
   completes the work associated with updating the framebuffer
 - app tries to write to the address (that has now been unmapped)
 - get pagefault and the above sequence occurs again
 
 The desire is roughly to allow bursty framebuffer writes to occur.
 Then after some time when hopefully things have gone quiet, we go and
 really update the framebuffer. For this type of nonvolatile high latency
 display, the desired image is the final image rather than intermediate
 stages which is why it's okay to not update for each write that is
 occuring.

OK, makes sense.  The whole idea is neat.

 
 
  What is the theory of operation here?
 
  Presumably this is a performance optimisation to permit batching of the
  copying from user memory into the frambuffer card?  If so, how much
  performance does it gain?
 
 Yes, you are right. Updating the E-Ink display currently requires
 about 500ms - 800ms. It is a non-volatile display and as such it is
 typically used in a manner where only the final image is important. As
 a result, being able to avoid the bursts of IO associated with screen
 activity and only write the final result is attractive.
 
 I have not done any performance benchmarks. I'm not sure exactly what
 to compare. I imagine in one case would be using write() to deliver
 the image updates and the other case would be mmap(), memcpy(). The
 latter would win because it's hiding all the intermediate writes.
 
 
  I expect the benefit will be large, and could be increased if you were to
  add a small delay between first-touch and writeback to the display.  Let's
  talk about that a bit.
 
 Agreed. Though I may be misunderstanding what you mean by first-touch.

First modification - when the page goes from clean to dirty (and
page_mkwrite gets called)

 Currently, I do a schedule_delayed_work and leave 1s between when the
 page_mkwrite callback indicating the first touch is received and when
 the deferred IO is processed to actually deliver the data to the
 display.

oh, doh - I missed the fact that you're already adding a delay.

 I picked 1s because it rounds up the display latency. I
 imagine increasing the delay further may make it miss some desirable
 display activity. For example, a slider indicating progress of music
 may be slower than optimal. Perhaps I should make the delay a module
 parameter and leave the choice to the user?

Don't know - your call.

It would be interesting to know if this trick is applicable to any other
framebuffer drivers.

 
  Is the optimisation applicable to other drivers?  If so, should it be
  generalised into library code somewhere?
 
 I think the deferred IO code would be useful to devices that have slow
 updates and where only the final result is important. So far, this is
 the only device I've encountered that has this characteristic.

OK.

 
  I guess the export of page_mkclean() makes sense for this application.
 
  The use of lock_page_nosync() is wrong.  It can still sleep, and here it's
  inside spinlock.  And we don't want to export __lock_page_nosync() to
  modules.  I suggest you convert the list locking here to a mutex and use
  lock_page().
 
 
 Oops, sorry about that. I will correct it.

Thanks.  Consider adding a nice long Overview of operation comment in
there too.

-

Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver

2007-01-11 Thread Nick Piggin

Jaya Kumar wrote:

On 1/11/07, Andrew Morton [EMAIL PROTECTED] wrote:


That's all very interesting.

Please don't dump a bunch of new implementation concepts like this on us
with no description of what it does, why it does it and why it does it in
this particular manner.



Hi Andrew,

Actually, I didn't dump without description. :-) I had posted an RFC
and an explanation of the design to the lists. Here's an archive link
to that post. 
http://marc.theaimsgroup.com/?l=linux-kernelm=116583546411423w=2

I wasn't sure whether to include that description with the patch email
because it was long.


From that email:


---
This is there in order to hide the latency
associated with updating the display (500ms to 800ms). The method used
is to fake a framebuffer in memory. Then use pagefaults followed by delayed
unmaping and only then do the actual framebuffer update. To explain this
better, the usage scenario is like this:

- userspace app like Xfbdev mmaps framebuffer
- driver handles and sets up nopage and page_mkwrite handlers
- app tries to write to mmaped vaddress
- get pagefault and reaches driver's nopage handler
- driver's nopage handler finds and returns physical page ( no
 actual framebuffer )
- write so get page_mkwrite where we add this page to a list
- also schedules a workqueue task to be run after a delay
- app continues writing to that page with no additional cost
- the workqueue task comes in and unmaps the pages on the list, then
 completes the work associated with updating the framebuffer


Have you thought about implementing a traditional write-back cache using
the dirty bits, rather than unmapping the page?

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver

2007-01-11 Thread Peter Zijlstra
On Thu, 2007-01-11 at 19:22 -0500, Jaya Kumar wrote:

 Agreed. Though I may be misunderstanding what you mean by first-touch.
 Currently, I do a schedule_delayed_work and leave 1s between when the
 page_mkwrite callback indicating the first touch is received and when
 the deferred IO is processed to actually deliver the data to the
 display. I picked 1s because it rounds up the display latency. I
 imagine increasing the delay further may make it miss some desirable
 display activity. For example, a slider indicating progress of music
 may be slower than optimal. Perhaps I should make the delay a module
 parameter and leave the choice to the user?

How about implementing the sync_page() aop? Then you could force the
flush using msync(MS_SYNC). 

Hmm... that might require more surgery but the idea would work I think.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver

2007-01-11 Thread Andrew Morton
On Fri, 12 Jan 2007 08:15:45 +0100
Peter Zijlstra [EMAIL PROTECTED] wrote:

 How about implementing the sync_page() aop?

That got deleted in Jens's tree - the unplugging rework.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/