On Mon, 2019-08-26 at 14:43 +0200, Felix Brack wrote: > Hello Weijie, > > On 26.08.19 10:19, Weijie Gao wrote: > > On Thu, 2019-08-22 at 15:58 +0200, Felix Brack wrote: > >> On 11.07.19 09:10, Weijie Gao wrote: > >> > >>> Some storage devices have multiple hw partitions and both address from > >>> zero, for example eMMC. > >>> However currently block cache invalidation only applies to block > >>> write/erase. > >>> This can cause a problem that data of current hw partition is cached > >>> before switching to another hw partition. And the following read > >>> operation of the latter hw partition will get wrong data when reading > >>> from the addresses that have been cached previously. > >>> > >>> To solve this problem, invalidate block cache after a successful > >>> select_hwpart operation. > >>> > >>> Signed-off-by: Weijie Gao <[email protected]> > >>> --- > >> This patch breaks correct operation of PDU001 board. > >> > >>> drivers/block/blk-uclass.c | 14 ++++++++++++-- > >>> 1 file changed, 12 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/drivers/block/blk-uclass.c b/drivers/block/blk-uclass.c > >>> index baaf431e5e0..c23b6682a6c 100644 > >>> --- a/drivers/block/blk-uclass.c > >>> +++ b/drivers/block/blk-uclass.c > >>> @@ -208,7 +208,11 @@ int blk_select_hwpart_devnum(enum if_type if_type, > >>> int devnum, int hwpart) > >>> if (ret) > >>> return ret; > >>> > >>> - return blk_select_hwpart(dev, hwpart); > >>> + ret = blk_select_hwpart(dev, hwpart); > >>> + if (!ret) > >>> + blkcache_invalidate(if_type, devnum); > >>> + > >>> + return ret; > >>> } > >>> > >>> int blk_list_part(enum if_type if_type) > >>> @@ -348,7 +352,13 @@ int blk_select_hwpart(struct udevice *dev, int > >>> hwpart) > >>> > >>> int blk_dselect_hwpart(struct blk_desc *desc, int hwpart) > >>> { > >>> - return blk_select_hwpart(desc->bdev, hwpart); > >>> + int ret; > >>> + > >>> + ret = blk_select_hwpart(desc->bdev, hwpart); > >>> + if (!ret) > >>> + blkcache_invalidate(desc->if_type, desc->devnum); > >>> + > >> Commenting the invalidation of the block cache on this line results in > >> proper working of the board. > >> > >>> + return ret; > >>> } > >>> > >>> int blk_first_device(int if_type, struct udevice **devp) > >>> > >> With the patch active, files from the boot FAT partition of the SD-card > >> (mmc device 1) do not load anymore, hence booting fails. > >> Using the eMMC (mmc device 0) instead works fine, files can bee loaded > >> and the board boots. > >> > >> To isolate the problem I have modified the configuration to only load a > >> 10k test file (test.bin) instead of loading the DTB and zImage files > >> followed by booting LINUX. Furthermore I have enabled debugging for some > >> parts of the code. > >> > >> When I boot from the SD-card (which fails) I get the following logged: > >> === > >> CPU : AM335X-GP rev 1.0 > >> === > >> > >> This time the file test.bin gets loaded correctly. > >> > >> To be honest I don't really see why things are going wrong. It looks > >> like the block cache gets filled but then again invalidated before it's > >> data is used. I also feel that this problem is related to the SD-card > >> not being the first (number 0) but the second device (number 1). > >> Can anybody shed some light on this and give me a hint? > >> > >> many thanks, Felix > > > > Hi Felix, > > > > I've found the root cause. > > > Many thanks for looking into this! > > > There is a bug in the FAT driver. In function file_fat_read_at(), > > malloc_cache_aligned() allocates memory for the itr. The itr is a > > pointer to struct fat_itr. fat_itr has a member block[MAX_CLUSTSIZE] > > used for storing blocks read from MMC. > > > > The FAT driver resolves the file table and stores the required dir entry > > in the member block. Then the pointer of the dir entry is passed to > > get_contents() to get the contents of the file. > > > > However the itr is freed BEFORE the call to get_contents(). This means > > the contents dir entry points to is no longer valid. And it results to > > your situation. > > > Indeed this is the root of all trouble. The freeing of itr of course > also invalidates the assignment made earlier: > > dir_entry *dentptr = itr->dent; > > As dentptr is used in the call to get_contents() the evil takes its > course... > > Again many thanks for analyzing and explaining this as detailed as you did. > > > But this only happens when CONFIG_BLOCK_CACHE=y and the patch applied. > > I've also found its cause. > > > > The MMC driver calls blk_dselect_hwpart() unconditionally in > > mmc_bread(). This causes block cache being invalidated and refilled many > > times during the FAT load operation. > > > Yes I have noticed this frequent invalidations of the cache due to calls > to blk_dselect_hwpart(). Some optimization, if possible, would be great. > > > The frequent block cache invalidation and refilling finally results in > > frequent memory allocation and freeing. There is a mechanism in the > > dlmalloc.c: if the total free space reaches a threshold, the > > malloc_trim() will be called. malloc_trim() calls sbrk(), which > > fills the memory being freed with zeros and of course the data of itr is > > damaged, otherwise the original data remains unchanged. > > > > So this is the reason the bug only appears when block cache is enabled > > and cache invalidation is called after every select_hwpart. > > > I see and now understand the different behavior of eMMC and SD-card. > > > I have made a patch, could you please have a try? I've tested it worked. > > If it works for you too, I will submit it to u-boot. And I will submit > > another patch to optimize the cache invalidation for select_hwpart. > > > > --- a/fs/fat/fat.c > > +++ b/fs/fat/fat.c > > @@ -1174,10 +1174,6 @@ int file_fat_read_at(const char *filename, loff_t > > pos, void *buffer, > > /* For saving default max clustersize memory allocated to malloc pool > > */ > > dir_entry *dentptr = itr->dent; > > > > - free(itr); > > - > > - itr = NULL; > > - > > ret = get_contents(&fsdata, dentptr, pos, buffer, maxsize, actread); > > > > out_free_both: > > > I can confirm that your patch works perfect, thanks! > > Please CC me when submitting the two patches you mentioned above. It > will be my pleasure to test them on my hardware. > > > > > BTW, I reproduced this failure on a MT7623 board, which also has an eMMC > > and a SD. And this patch is indeed used for MT7623 to solve the bug on > > switching hwpart on eMMC. > > > > > > Best Regards, > > > > Weijie > > > > regard, Felix
Hi Felix, I found there's already a commit merged 6 hours ago (mailed on Aug 20) that fixed this issue: d7af2a863017be1f5fd1b65a858ddc7e87d7b876 So there's no need to send patch again. What you need is to pull the latest master branch. I will still send a patch to optimize the invalidation in select_hwpart. Best Regards, Weijie _______________________________________________ U-Boot mailing list [email protected] https://lists.denx.de/listinfo/u-boot

