Re: Invitation and RFC: Linux Plumbers Device Tree track proposed

2015-04-14 Thread Arnd Bergmann
On Tuesday 14 April 2015 10:36:15 Rob Herring wrote:
 
 4) Identifying additional people who should attend the device tree track.
 
 Arnd Bergmann
 Matt Porter
 Jon Loeliger
 Gaurav Minocha

Sorry, I won't be there. I should have replied earlier, but I'll be on
parental leave at the time.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Handling of modular boards

2012-05-04 Thread Arnd Bergmann
On Friday 04 May 2012, Mark Brown wrote:
 Quite a few reference platforms (including Wolfson ones, which is why
 I'm particularly interested) use replaceable modules to allow
 configuration changes.  Since we can often identify the configuration at
 runtime we should ideally do that but currently there's no infrastructure 
 to help with that, generally this seems to be done in arch code for the
 machine but this doesn't scale when even the CPU might change and isn't
 terribly device tree compatible either.
 
 For reference the code for current Wolfson plugin modules is in
 arch/arm/mach-s3c64xx/mach-crag6410-module.c.

Hi Mark,

Thanks for getting the discussion started. I've seen the same issue come
up for arch/arm/mach-ux500/board-mop500*uib.c and for the beaglebone.
I'm sure there are many more, but we should make sure that everyone
of these can live with whatever we come up with.

 The most obvious current fit here is the MFD subsystem but it feels like
 we need some slightly different infastructure to what MFD currently
 provides.  MFD is really set up to handle platform devices with a core
 and linear ranges of resources fanning out from that core since they're
 really oriented around chips.  In contrast these boards are more about
 remapping random collections of potentially unrelated resources and
 instantiating devices on all sorts of buses and share more with board
 files.
 
 I'm just starting to put some stuff together for this so I was wondering
 if anyone had been thinking about this and had any bright ideas for how
 to handle it, and also if people think that MFD is a good fit for this
 or if we should split the silicon MFDs from these PCBs.

One idea that I've heard before is to put device tree fragments into the
kernel and dynamically add them to the device tree that was passed by the
boot loader whenever we detect the presence of a specific device.
This obviously means it works only for boards using DT for booting, but
it allows us to use some infrastructure that we already have.

Another idea was to put all the possible extensions into the device tree
for a given board and disable them by default, putting it into the
responsibility of the boot loader to enable the one that is actually
being used. This has serious scalibility problems when there are many
possible extensions and also relies more on the boot loader than I would
like.

An intermediate solution that I really like is the ability to
stuff device tree fragments on extension board themselves, but that
can only work for new designs and causes problems when that information
is not actually correct.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Handling of modular boards

2012-05-04 Thread Arnd Bergmann
On Friday 04 May 2012, Wolfgang Denk wrote:
 In message 201205041934.08830.a...@arndb.de you wrote:
 
  One idea that I've heard before is to put device tree fragments into the
  kernel and dynamically add them to the device tree that was passed by the
  boot loader whenever we detect the presence of a specific device.
  This obviously means it works only for boards using DT for booting, but
  it allows us to use some infrastructure that we already have.
  
  Another idea was to put all the possible extensions into the device tree
  for a given board and disable them by default, putting it into the
  responsibility of the boot loader to enable the one that is actually
  being used. This has serious scalibility problems when there are many
  possible extensions and also relies more on the boot loader than I would
  like.
 
 On the other hand, some of the issues we're trying to solve here
 for the kernel are also present in the boot loader, so this needs to
 do this anyway - whether by inserting new or modifying (enabling or
 disabling) existing properties in the DT is not really relevant here.

I haven't seen a case where the add-on board is actually required
for booting. What examples are you thinking of?

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Handling of modular boards

2012-05-04 Thread Arnd Bergmann
On Friday 04 May 2012, Wolfgang Denk wrote:
 There are systems (and I bet it will be a growing number) where U-Boot
 itself uses the DT for configuration.  Also, there are functions that
 are needed both by the boot loader and the kernel - for example to
 dislay a splash screen the boot loader needs to initialize the
 display, so it must be able to detect which type of LCD is attached
 (resolution, color-depth, orientation) - the device tree comes in very
 handy here.  Why should Linux re-do all such things?

Sure, there are a lot of things that the boot loader can use from the
device tree, but I'm not sure if the LCD panel connection fits into
the same category as the devices that Mark was thinking of.

Anyway, display controllers are definitely something that needs to
be handled in some way, which may or may not be the same way we
handle more complex collections of arbitrary devices.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: android logger feedback request

2011-12-22 Thread Arnd Bergmann
On Thursday 22 December 2011, NeilBrown wrote:
 If you created a 'logbuf' filesystem that used libfs to provide a single
 directory in which privileged processes could create files then you wouldn't
 need the kernel to know the allowed logs: radio, events, main, system.
 The size could be set by ftruncate() (by privileged used again) rather than
 being hardcoded.
 
 You would defined 'read' and 'write' much like you currently do to create a 
 list of
 datagrams in a circular buffer and replace the ioctls by more standard
 interfaces:
 
 LOGGER_GET_LOG_BUG_SIZE would use 'stat' and the st_blocks field
 LOGGER_GET_LOG_LEN would use 'stat' and the st_size field
 LOGGER_GET_NEXT_ENTRY_LEN could use the FIONREAD ioctl
 LOGGER_FLUSH_LOG could use ftruncate
 
 The result would be much the same amount of code, but an interface which has
 fewer details hard-coded and is generally more versatile and accessible.

I like the idea and was going to suggest something very similar, but I wonder
if we could take the approach even further:

* Remove all kernel code for this and use a user space library together
  with tmpfs
* prepopulate the tmpfs at boot time with all the log buffers in the right
  size, and set the maximum file system size so that they cannot grow further.
* Have minimal formatting in the log buffer: A few bytes header (ring buffer
  start and end)
* Mandate that user space must use mmap and atomic operations to reserve space
  in the log and write to the files.
* Provide a tool to get the log data out of the buffer again in a race-free way.

Since any program that is allowed to write to the buffer can overwrite all
existing information in it anyway, I think we don't actually need any kernel
help in maintaining consistency of the contents either -- the reader will
simply discard any data. The main thing we would not be able to guarantee
without kernel help is proving the origin of individual messages, but I'm
not sure if that is a design goal.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dma_unmap_single() lacking cache sync on some archs?

2011-09-27 Thread Arnd Bergmann
On Tuesday 27 September 2011 09:55:02 Håvard Skinnemoen wrote:
 
 On Tue, Sep 27, 2011 at 5:13 AM, Arvid Brodin arvid.bro...@enea.com wrote:
  [Resending with CC to affected parties]
 
  Hi,
 
  I would expect cache synchronization for DMA_TO_DEVICE and DMA_BIDIRECTIONAL
  when dma_map_single() is called, and for DMA_FROM_DEVICE and 
  DMA_BIDIRECTIONAL
  when dma_unmap_single() is called.
 
  However, on some architechtures (at least avr32, blackfin, ...), cache
  synchronization only happens when dma_map_single() is called (and then
  irrespective of DMA direction). dma_unmap_single() is a no-op for these 
  archs.
 
  See e.g. 
  http://lxr.linux.no/#linux+v3.0.4/arch/avr32/include/asm/dma-mapping.h#L117
 
  Isn't this a bug?
 
 I don't think so. What do other architectures do?
 
 We always need to sync before the transfer because if there is dirty
 data in the cache, it might get written to RAM during the transfer,
 which would be bad. Then, since the relevant cache lines are already
 clean and invalid, and the CPU is not allowed to access the buffer
 during the transfer, there's no need to sync again when the transfer
 is complete.

On some architectures, e.g. ARMv6 and higher, a speculative prefetch might
cause cache lines to be read again while an inbound DMA is on its way.
On those architectures you need to discard cache lines before reading from
the buffer. In fact also for DMA_FROM_DEVICE you need to flush or invalidate
the cache for the buffer before the transfer and invalidate the cache again
after the transfer.

Most architectures however do not require this.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3] UBI: new module ubiblk: block layer on top of UBI

2011-09-09 Thread Arnd Bergmann
On Friday 09 September 2011, Artem Bityutskiy wrote:
 On Thu, 2011-09-08 at 17:26 +0200, Arnd Bergmann wrote:
  On Tuesday 06 September 2011, Artem Bityutskiy wrote:
   Not sure about the bus approach - David, could you take a look at it
   please? If we can handle errors there - then we could indeed re-use the
   UBI control device. We could even re-use the ioctl data structures for
   UBI volumes creation/removal - we have plenty of space there reserved
   for future extensions.
  
  I would generally recommend using new ioctl commands. ioctl numbers
  are cheap, but complexity in data structures is not, because every
  user who wants to deal with the data structures has to understand
  them. Also, changing the ABI is always tricky since you have to
  provide backward and forwards compatibility with existing kernels
  and with existing user space.
 
 Hmm, what do we do if ubiblk module is not loaded, and UBI would have
 to return an error (because the block device cannot be created), how
 will UBI know that ubiblk is not there? Any direct call to ubiblk from
 UBI would be a direct dependency and would require ubiblk to be always
 loaded, which is bad.

No, the idea of this approach is that the main ubi driver creates
a device, which can always succeed. It's just that there won't
be a block device node created, because that is part of what
the ubiblk driver does.

Compare this to how scsi works:

A scsi host driver scans the host controller and adds scsi devices
internal to the kernel, each of them have a specific type (disk,
tape, ...). If the scsi disk driver is loaded, it will create
a blockdev for each disk device. It doesn't matter in which order
the drivers are loaded though.

In case of ubiblk, it's similar, except that there is no way for
the ubi layer to know if some partition should be a block device or
not, so it relies on user space to tell it.

Well, actually, you /could/ encode this somewhere so that the main
ubi layer creates different kinds of devices based on what it finds:
a ubiblk_device when it finds a partition that was created as a
block device or gluebi_device for gluebi or a ubifs volume.

 IOW, we need a blocking mechanism to call the upper layer's function
 (ubiblk) from the lower layer (UBI) which can return an error, and which
 allows to check if a ubiblk exists at all. Do we have such mechanism?
 
 Actually the fact of invoking upper layers from lower makes me worry.

Yes, you should not do that.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3] UBI: new module ubiblk: block layer on top of UBI

2011-09-08 Thread Arnd Bergmann
On Tuesday 06 September 2011, Artem Bityutskiy wrote:
 Not sure about the bus approach - David, could you take a look at it
 please? If we can handle errors there - then we could indeed re-use the
 UBI control device. We could even re-use the ioctl data structures for
 UBI volumes creation/removal - we have plenty of space there reserved
 for future extensions.

I would generally recommend using new ioctl commands. ioctl numbers
are cheap, but complexity in data structures is not, because every
user who wants to deal with the data structures has to understand
them. Also, changing the ABI is always tricky since you have to
provide backward and forwards compatibility with existing kernels
and with existing user space.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3] UBI: new module ubiblk: block layer on top of UBI

2011-08-25 Thread Arnd Bergmann
On Thursday 25 August 2011, Artem Bityutskiy wrote:
 On Wed, 2011-08-24 at 18:23 +0200, Arnd Bergmann wrote:
  That should  be fine, yes. I would probably put them into the same
  header file though if they are in the same number space even
  when you use them on distinct devices.
  
  It does feel a little clumsy to have yet another character device
  to manage the block devices though. What do you think about one
  of these alternative approaches:
  
  * When the ubi block device driver gets loaded, create one block
device per volume and let the user deal with permissions for
the devices instead of having to first create them as well.
 
 I think this wasteful. Why should I have block devices which I do not
 need? If I have 4 UBI volumes, and need only one ubiblk, why should I
 waste my resources for 3 more of them (e.g., I do not want to waste
 memory for struct inode for each sysfs entry which these useless block
 devices will add). Also, will this mean 3 more block devices registered?
 
 I think it is much uglier to have 3 dummy block devices and confuse
 users than have one nice control character device. For the sake of not
 having a separate control chardev?

The cost of a block device node in the kernel is rather low. Nowadays,
sysfs does not even permanently use inodes for entries, it has a much
more compact internal representation IIRC.

The main advantage of this approach is not having to set up the 
block device at all, it would just be there, which e.g. makes it
possible to put a root file system on it or do something else without
requiring a user space tool to issue an ioctl.

Evidently you can do everything you need even with that user space
tool, but IMHO the complexity of doing that is way bigger than
just creating the block devices right away.

  * Use the existing UBI control device for the block devices as
well and just add two more ioctls to create the devices.
You can add a logical bus_type for this so that the ubi block
driver gets automatically loaded matched with the device when
one is created using the control device.
 
 This sounds  better IMHO, but I am still not sure that adding another
 dummy bus and exposing it in sysfs and more complexity in the ubiblk
 code is more elegant and less wasteful than just creating a separate
 chardev...

It's not a dummy bus, in this approach it would be a the bus that gets
used by all ubiblk devices, which is a very common concept by itself.
It's more like the classic understanding of a 'device class' that Greg
wants to see get replaced by bus_types in the kernel.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3] UBI: new module ubiblk: block layer on top of UBI

2011-08-24 Thread Arnd Bergmann
On Monday 22 August 2011, Artem Bityutskiy wrote:
 
 On Wed, 2011-08-17 at 15:17 +0200, david.wag...@free-electrons.com
 wrote:
  Questions:
  ==
  I wasn't sure what magic ioctl number to use, so I settled to use the same 
  one
  as a part of UBI: 'O', which was so far only used by UBI but on a higher 
  range
  and leaving some room for UBI to add ioctls (for nw, it uses 'O'/0x00-0x06 
  and
  ubiblk uses 'O'/0x10-0x11).  Is it ok or should ubiblk use a different
  number/range ?
 
 I think this is OK to share them between UBI and ubiblk, as long as this
 is documented.

That should  be fine, yes. I would probably put them into the same
header file though if they are in the same number space even
when you use them on distinct devices.

It does feel a little clumsy to have yet another character device
to manage the block devices though. What do you think about one
of these alternative approaches:

* When the ubi block device driver gets loaded, create one block
  device per volume and let the user deal with permissions for
  the devices instead of having to first create them as well.
* Use the existing UBI control device for the block devices as
  well and just add two more ioctls to create the devices.
  You can add a logical bus_type for this so that the ubi block
  driver gets automatically loaded matched with the device when
  one is created using the control device.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: architecture-independent I/o accessors

2009-08-18 Thread Arnd Bergmann
On Tuesday 18 August 2009 21:07:01 Wolfgang Denk wrote:
 Dear Arnd,
 
 Josh Boyer suggested you might provide some insight...
 
 I'm currently looking for a solution how to provide architecture
 independent I/O accessor functions to U-Boot. In the past, lots of
 code used direct pointer accesses, relying on the idea that volatile
 would be sufficient to convince the compiler and the hardware to do
 what was expected; some architectures (like ARM and others) used
 readl() / writel(), while others (like PPC) used in_8, in_le16,
 in_be16, in_le32, in_be32, in_le64, in_be64 etc.
 
 As we like to borrow code from Linux, I'm trying to find out what the
 big plan for Linux is.
 
 My understanding is that in Linux the ioreadX() / iowriteX() /
 ioreadXbe() / iowriteXbe() functions are supposed to provide
 architecture independent I/O accessors, and that the plain ioreadX()
 / iowriteX() functions (without the be) are always guaranteed to be
 little-endian on all architectures, while the be functions are,
 well, big-endian.  Is this understanding correct?

yes. Also, these functions are defined so that you can use them
both for memory mapped I/O *and* for programmed I/O (aka inl/outl).

 If yes, does that mean that in the future we will see more Linux code
 using ioreadX[be]() / iowriteX[be]()? So far I did not find much
 hints that support this aproach - only memory-barriers.txt has only a
 short sentence about these functions, with basicly no explanation.

The most common ones are readl/writel, simply because they are better
known. For devices that only have memory mapped I/O, they are
by definition equivalent to ioread32/iowrite32.

The SATA drivers and others use ioread32/iowrite32 because that
lets the driver ignore the difference between PIO and MMIO.

 What I liked from the in_[le]X() / out_[le]X() accessors on PPC was
 that they allowed for type checking - the compiler would raise a
 warning when you used in_[le]16() to read from a 32 bit wide register.
 However, ioreadX[be]() / iowriteX[be]() use a void * iomem cookie,
 so no type checking can be done.

Hmm, interesting. I was never aware of that difference. We should
probably change that in the kernel, to add type checking to all
of them.

Another difference on powerpc is that in_le32/out_le32 do not
can not be used on PCI devices but only SoC, because legacy iSeries
and pSeries need some additional magic for PCI accesses.

 Basicly I have two questions:
 
 1) Can you make a statement which direction Linux is heading to?
Will more (new) code use ioreadX() / iowriteX()?

New subsystems will often use ioreadX/iowriteX by default, but
I expect existing code to keep using readl/writel and new drivers
will also keep using it.

 2) What would be your recommendation what we should do in U-Boot?
Provide for all architectures in_8, in_le16, in_be16, in_le32,
in_be32, in_le64, in_be64 etc. similar to what we have for the
Power architecture, well knowing that Linux will not follow that
route, or use ioreadX[be]() / iowriteX[be]() which does not provide
type checking, and which eventually does not find wider use in
Linux either? Or even something else - like ioreadX[be]() /
iowriteX[be]() with type checking added?

I think ioread32/iowrite32 and friends with type checking would
be the easiest. It would be nice to try adding type checking to
the kernel, just to see what breaks ;-)

Arnd 
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/14] Pramfs: Include files

2009-06-23 Thread Arnd Bergmann
On Tuesday 23 June 2009, David Woodhouse wrote:
 And dd on /dev/mem would work, surely?

Actually, reading from /dev/mem is only valid on real RAM. If the nvram
is part of an IO memory mapping, you have to do mmap()+memcpy() rather
than read(). So dd won't do it, but it's still easy to read from user
space.

 I'd definitely recommend making it fixed-endian. Not doing so for JFFS2
 was a mistake I frequently regretted.

Right.

Arnd 
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/14] Pramfs: Include files

2009-06-22 Thread Arnd Bergmann
On Monday 22 June 2009, Marco wrote:
 
 Sorry, I meant it's not currently possible. At the moment the only way
 to use it as rootfs it's to copy all the data in an already mounted
 (empty) ram partition and reboot. However it's not my first item on my
 todo list because I think that it's possible to use it as rootfs but it
 isn't the standard use for this fs.

Well, it doesn't have to work right away. What I'm asking to
define the data structures in a way that keeps the layout stable
across kernel updates. Since a future version of the file system
might support cross-endian image creation, it would be good to
define the data structures in a fixed endian mode already, so
you don't have to change it in the future.

Arnd 
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/14] Pramfs: Include files

2009-06-22 Thread Arnd Bergmann
On Monday 22 June 2009, Jörn Engel wrote:
 Four loops doing the same increment with different data types: long,
 u64, we32 (wrong-endian) and we64.  Compile with no optimizations.
 
 Results on my i386 notebook:
 long: 453953 us
 we32: 880273 us
 u64:  504214 us
 we64:2259953 us
 loops: 1

(couldn't resist)

The we64 number is artificially high because the glibc bswap_64
implementation forces the conversion to be done on the stack.
Using __builtin_bswap64 make this look more logical, and
makes your point even stronger (on core 2, using -m32):

long: 236792 us
we32: 500827 us
u64:  265990 us
we64: 757380 us
loops: 1

Arnd 
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/10] AXFS: axfs_inode.c

2008-08-22 Thread Arnd Bergmann
On Friday 22 August 2008, Phillip Lougher wrote:
  
  This looks very nice, but could use some comments about how the data is
  actually stored on disk. It took me some time to figure out that it actually
  allows to do tail merging into compressed blocks, which I was about to 
  suggest
  you implement ;-). Cramfs doesn't have them, and I found that they are the
  main reason why squashfs compresses better than cramfs, besides the default
  block size, which you can change on either one.
 
 Squashfs has much larger block sizes than cramfs (last time I looked it 
 was limited to 4K blocks), and it compresses the metadata which helps to 
 get better compression.  But tail merging (fragments in Squashfs 
 terminology) is obviously a major reason why Squashfs gets good compression.

The *default* block size in cramfs is smaller than in squashfs, but they both
have user selectable block sizes. I found the impact of compressed metadata
to be almost zero. I hacked up a mksquashfs to avoid tail merging, and found
that the image size for squashfs and cramfs is practically identical if you
use the same block size and no tail merging.

 The AXFS code is rather obscure but it doesn't look to me that it does 
 tail merging.  The following code wouldn't work if the block in question 
 was a tail contained in a larger block.  It assumes the block extends to 
 the end of the compressed block (cblk_size - cnode_offset).

yes, I thought the same thing when I first read that code, and was about
to send a lengthy reply about how it should be changed when I saw that
it already does exactly that ;-).

Arnd 
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/10] AXFS: axfs.h

2008-08-22 Thread Arnd Bergmann
On Friday 22 August 2008, Jared Hulbert wrote:
  This bytetable stuff looks overly complicated, both the data structure and
  the access method. It seems like you are implementing your own custom 
  Huffman
  compression with this.
 
  Is the reasonn for the bytetable just to pack numbers efficiently, or do you
  have a different intention?
 
 It looks more complicated than it is.  I need a data structure that is
 64bit capable, easily read-in-place (remember this is designed to be
 an XIP fs), and highly space efficient.  Because it's XIP I didn't
 want something that required a lot of calculation nor something that
 made you incur a lot of cache misses.  So yes I just want to pack
 numbers in an easily read-in-place fashion.

ok, that makes sense.
 
 If I have an array of u64 numbers tracking small numbers (a[0] = 1;
 a[1] = 2;) just throwing that onmedia is a big waste.
 (0x0001; 0x0002)  Having different array types
 for different images such as arrays of u8,u16,u32,u64 becomes less
 efficient for 3,5,6 and 7 byte numbers, 3 bytes was a particularly
 interesting size for me.
 
 All I'm doing is removing the totally unnecessary zeros and aligning by bytes.
 Take an array of u64 like this :
 0x0005
 0x1001
 0x000a
 
 I strip off the unneeded leading zeros:
 0x05
 0x001001
 0x0a
 
 Then pack them to byte alignment:
 0x050010010a
 
 Sure it could be encoded more but that would make it harder to extract
 the data.  This way I can read the data in one, maybe two, cache
 misses.  A couple of shifts to deal with the alignment and endianness
 and we are done.

So do I understand right that 3 bytes is your minimum size, and going
smaller than that would not be helpful? Otherwise I would assume that
storing a '5' should only take one byte instead of three.

I don't unsterstand yet why you store the length of each word separate
from the word. Most variable-length codes store that implicitly in
the data itself, e.g. in the upper three bits, so that for storing
0x5, 0x1001, 0xa, this could e.g. end up as 0x054010014a,
which is shorter than what you have, but not harder to decode.

  Did you see a significant size benefit over simply storing all metadata as
  uncompressed data structures like in cramfs?
 
 Yes. For some modest values of significant.  In terms of the amount of
 space required to track the metadata it is more dramatic.  For a small
 rootfs I can fit many of the data structures in an u8 array, while
 maintaining u64 compatibility.  Compared to dumping u64 arrays onmedia
 that's an 8X savings.  But it's an 8X savings of a smallish percentage
 of the image size.  The difference is more pronounced on a smaller
 (2MB) filesystem I tested but it was only ~5% if memory serves me
 correct.

If you can save 5% on a real-world file system, you have convinced me.

  Have you considered storing simple dentry/inode data in 
  node_type==Compressed
  nodes?
 
 Yes, I thought a lot about that.  But I choose against it because I
 wanted read-in-place data structures for minimum RAM usage in the XIP
 case and I figure the way I do it would stat() faster.

ok.

Arnd 
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/10] AXFS: Advanced XIP filesystem

2008-08-22 Thread Arnd Bergmann
On Friday 22 August 2008, Geert Uytterhoeven wrote:
 I gave AxFS a try on PS3 (ppc64, always use big-endian 64-bit for testing new
 code ;-).
 When mounting the image, I got the crash below:
 
 | attempt to access beyond end of device
 | loop0: rw=0, want=4920, limit=4912
 | Unable to handle kernel paging request for data at address 0x0028


Offset 0x28 is buffer_head-b_data, so it seems like sb_bread returns NULL,
which it does for out of range block numbers. I guess axfs_copy_block
should check for that condition, as it can happen on malicious file system
images.
I agree that this is likely to get caused by an endianess bug.
A good help for finding endianess bugs is to use __be64 like data types
everywhere and test with sparse -D__CHECK_ENDIAN__.

Arnd 
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/10] AXFS: axfs_super.c

2008-08-22 Thread Arnd Bergmann
On Friday 22 August 2008, Jared Hulbert wrote:
  This implies for block devices that the entire filesystem metadata has to be
  cached in RAM.  This severely limits the size of AXFS filesystems when using
  block devices, or the else memory usage will be excessive.
 
 This is where 64bit squashfs could be a better fit.

Is this the only place where squashfs has a significant advantage? 
If so, you might want to change it in axfs eventually to make the
decision easier for users ;-)

It certainly sounds like something for your medium-term TODO list,
although I wouldn't think of it as a show-stopper.

Arnd 
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/10] AXFS: axfs_profiling.c

2008-08-22 Thread Arnd Bergmann
On Friday 22 August 2008, you wrote:
 You mean to take this off list?

No, i replied to your mail that was sent just to me.
Putting everyone back on now

  In 3, you create files with sysfs_create_file, and are fairly limited
  with how you can use it. A structured file like you have in procfs
  would not be allowed. File names are fixed, directory names can
  be used to identify the mounted file systems. You can create symlinks
  between your directory and other things in sysfs.
 
 What do you mean a structured file wouldn't be allowed?  What's in them then?

sysfs files are meant to have just a single value. Some have a list of
values of the same type, but a file that needs a nontrivial parser
(even sscanf) is not allowed in sysfs, by convention.
There is also the technical limitation of the size to a single page,
which makes it hard to write variable size data.

  In 4, you write a whole file system like debugfs (it's not as hard
  as it sounds) and are free to do anything in there, but you can't
  easily symlink to sysfs.
 
 Argh.  No it might not be too bad to do to do, but it sounds like a
 maintenance hassle.  Sounds like the best option though.
 
 Why did we decide debugfs is a bad fit?

It's basically the same as debugfs -- actually I once started a patch
to make it a single function call to instantiate a debugfs-like
file system, but I never finished that patch.

debugfs is a bad idea here because it is not meant for stable interfaces
but rather ad-hoc stuff. In a distribution kernel, debugfs is supposed
to be empty.

  So where does a page show up in the profile if you have two identical
  files and both are mapped?
 
 In which ever file was actually read.  The kernel driver doesn't
 really know pages are redundant.

ok.

  Will the kernel map them to the same page
  but count the files separately, or will it show the same count for both?
 
 I count faults on pages in mmap() so I don't really care whether a
 page is mapped twice or just once.  I'll count it every time you fault
 it even if it's the same physical page.  It's the image builders job
 to figure out if there are redundant pages.

ok, makes sense.

I think there is still another option, which would be to generalize
the profiling interface so it can work with arbitrary file systems.
I'm sure that other people can benefit from that as well, e.g. for
optimizing boot times on disks. For such a general interface,
a per-file ioctl would fit best, and then file systems can implement
it if they want, or it can be moved into VFS.

Arnd 
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/10] AXFS: axfs_super.c

2008-08-22 Thread Arnd Bergmann
On Friday 22 August 2008, Phillip Lougher wrote:
 1. Support for  4GB filesystems.  In theory 2^64 bytes.
 2. Compressed metadata
 3. Inode timestamps
 4. Hard-link support, and correct nlink counts
 5. Sparse file support
 6. Support for .  .. in readdir
 7. Indexed directories for fast lookup
 8. NFS exporting
 9. No need to cache entire metadata in memory

 Squashfs has been optimised for block-based rotating media like hard 
 disks, CDROMS.  AXFS has been optimised for flash based media.  Squashfs 
 will outperform AXFS on rotating media, AXFS will outperform Squashfs on 
 flash based media.

Ok, thanks for the list. I'm sure that sparse files are already
part of AXFS, and among the other things, I would consider some
to be AXFS bugs rather than squashfs features (. in readdir, in
particular), but I get the point.

 Squashfs and AXFS should be seen as complementary filesystems, and there 
 should be room in the Linux kernel for both.
 
 I don't see what your problem is here.  I think AXFS is an extremely 
 good filesystem and should be merged.  But I don't see why this should 
 lead to more Squashfs bashing.

Sorry, I didn't mean to be abusive. From first look, it appeared to do
everything that squashfs does, with less code, but you've made it clear
that there is need for both of them.
I would still expect axfs to replace cramfs for all practical purposes,
even though that was written by our Emperor Penguin ;-)

Arnd 
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/10] AXFS: axfs_profiling.c

2008-08-22 Thread Arnd Bergmann
On Thursday 21 August 2008, Jared Hulbert wrote:
 1) same mount point -
 I don't see how this works without an ioctl.  I can't just make up
 files in my mounted filesystem.   You expect the mounted version to
 match input to the mkfs.  I'd not be happy with an ioctl.  You can
 just read it.
 
 2) sysfs -
 I agree with Carsten, I don't see how this fits in the sysfs hierarchy.
 
 3) debugfs -
 I don't know diddly about this.

Ok, so now yet another suggestion, which may sound a little strange:

oprofilefs

I believe you can use the oprofile infrastructure to record data
about file accesses, even independent of the file system you
are looking at.

It's probably a lot of work to get it right, but I would be worth it.

Arnd 
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/10] AXFS: Kconfig and Makefiles

2008-08-21 Thread Arnd Bergmann
On Thursday 21 August 2008, Jared Hulbert wrote:
 The Kconfig edits and Makefiles required for AXFS.
 
 Signed-off-by: Jared Hulbert [EMAIL PROTECTED]

If you split out this patch separate from the files, please make it the
*last* patch so that you cannot get build errors during a later git-bisect
through the middle of your series.

Arnd 
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/10] AXFS: axfs_inode.c

2008-08-21 Thread Arnd Bergmann
On Thursday 21 August 2008, Jared Hulbert wrote:
 +/* functions in other axfs files 
 **/
 +int axfs_get_sb(struct file_system_type *, int, const char *, void *,
 + struct vfsmount *);
 +void axfs_kill_super(struct super_block *);
 +void axfs_profiling_add(struct axfs_super *, unsigned long, unsigned int);
 +int axfs_copy_mtd(struct super_block *, void *, u64, u64);
 +int axfs_copy_block(struct super_block *, void *, u64, u64);

*Never* put extern declarations into a .c file, that's what headers are for.
If you ever change the definition, the compiler doesn't get a chance to
warn you otherwise.

 +/**/
 +static int axfs_readdir(struct file *, void *, filldir_t);
 +static int axfs_mmap(struct file *, struct vm_area_struct *);
 +static ssize_t axfs_file_read(struct file *, char __user *, size_t, loff_t 
 *);
 +static int axfs_readpage(struct file *, struct page *);
 +static int axfs_fault(struct vm_area_struct *, struct vm_fault *);
 +static struct dentry *axfs_lookup(struct inode *, struct dentry *,
 +   struct nameidata *);
 +static int axfs_get_xip_mem(struct address_space *, pgoff_t, int, void **,
 + unsigned long *);

For style reasons, also please don't put static forward declarations anywhere,
but define the functions in the right order so you don't need them.

Arnd 
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/10] AXFS: axfs_profiling.c

2008-08-21 Thread Arnd Bergmann
On Thursday 21 August 2008, David Woodhouse wrote:
 On Thu, 2008-08-21 at 10:44 +0200, Carsten Otte wrote:
 
  Exporting profiling data for a file system in another file system 
  (/proc) seems not very straigtforward to me. I think it is worth 
  considering to export this information via the same mount point.
 
 I would have said sysfs, rather than 'the same mount point'.
 

Let me throw in debugfs as my preferred option. sysfs is for stable
interfaces, while profiling generally fits into the debugging category.

Arnd 
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/10] AXFS: axfs_inode.c

2008-08-21 Thread Arnd Bergmann
On Thursday 21 August 2008, Jared Hulbert wrote:
 +   array_index = AXFS_GET_INODE_ARRAY_INDEX(sbi, ino_number);
 +   array_index += page-index;
 +
 +   node_index = AXFS_GET_NODE_INDEX(sbi, array_index);
 +   node_type = AXFS_GET_NODE_TYPE(sbi, array_index);
 +
 +   if (node_type == Compressed) {
 +   /* node is in compessed region */
 +   cnode_offset = AXFS_GET_CNODE_OFFSET(sbi, node_index);
 +   cnode_index = AXFS_GET_CNODE_INDEX(sbi, node_index);
 +   down_write(sbi-lock);
 +   if (cnode_index != sbi-current_cnode_index) {
 +   /* uncompress only necessary if different cblock */
 +   ofs = AXFS_GET_CBLOCK_OFFSET(sbi, cnode_index);
 +   len = AXFS_GET_CBLOCK_OFFSET(sbi, cnode_index + 1);
 +   len -= ofs;
 +   axfs_copy_data(sb, cblk1, (sbi-compressed), ofs, 
 len);
 +   axfs_uncompress_block(cblk0, cblk_size, cblk1, len);
 +   sbi-current_cnode_index = cnode_index;
 +   }
 +   downgrade_write(sbi-lock);
 +   max_len = cblk_size - cnode_offset;
 +   len = max_len  PAGE_CACHE_SIZE ? PAGE_CACHE_SIZE : max_len;
 +   src = (void *)((unsigned long)cblk0 + cnode_offset);
 +   memcpy(pgdata, src, len);
 +   up_read(sbi-lock);

This looks very nice, but could use some comments about how the data is
actually stored on disk. It took me some time to figure out that it actually
allows to do tail merging into compressed blocks, which I was about to suggest
you implement ;-). Cramfs doesn't have them, and I found that they are the
main reason why squashfs compresses better than cramfs, besides the default
block size, which you can change on either one.

Have you seen any benefit of the rwsem over a simple mutex? I would guess
that you can never even get into the situation where you get concurrent
readers since I haven't found a single down_read() in your code, only
downgrade_write().

Arnd 


Re: [PATCH 21/23] make section names compatible with -ffunction-sections -fdata-sections: v850

2008-07-02 Thread Arnd Bergmann
On Wednesday 02 July 2008, Denys Vlasenko wrote:
 This patch fixes v850 architecture.

For all I know, v850 has been broken and unmaintained for a few years now,
didn't someone have a patch to remove it entirely?

Arnd 
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 21/23] make section names compatible with -ffunction-sections -fdata-sections: v850

2008-07-02 Thread Arnd Bergmann
On Thursday 03 July 2008, Andi Kleen wrote:
 Same seems to be true for cris btw.

Cris has seen significant updates in 2.6.25 by its maintainer.
It's not a very active port, but skipping updates for one kernel
version is on a completely different scale from doing nothing
at all for over three years as in the v850 case.

I don't currently see any architecture (other than v850) in a
state that justifies removing it entirely.

Arnd 
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html