Re: FCB revamp
Hi Marko and Will, I have experienced the same problem while developing nvs for zephyr. I always got stuck when I would except that there would be changes to flash by an "external" factor. As soon as this could happen there is a problem with the meaning of the crc. The crc fails this means: a. Incomplete write, b. Some tampering occurred, c. The len is not pointing to the correct location, In all these cases the result will be that the data is not considered valid, the only case where a increased crc might be helping is in b. In the latest version of nvs I have changed how the data is written to flash. The data itself is written from the start of a sector, a "allocation entry" is written from the end of a flash sector. This allocation entry contains a sector offset and length of the data written. The allocation entry is of fixed size. The writing of data is done using the following method: 1. Check if there is sufficient space (difference between allocation entry location and next data location), 2. Write the data at the specific location, this could include a data crc, 3. Write the allocation entry, this includes a crc calculated over the item length and offset, When there is insufficient space a allocation entry of all zeros is written at the end of the allocation entries, and a new sector is used. At startup the allocation entry write position is found by stepping through the allocation entries until all 0xff are found. The data write position is found by searching the position from which all data up to the allocation entry write position is 0xff. The disadvantage is of course that the allocation entries take up some space (only the offset is added). KInd regards, Jehudi Op ma 30 jul. 2018 om 23:16 schreef Cufi, Carles : > > + Andrzej > > On 30/07/2018, 18:45, "will sanfilippo" wrote: > > I guess you could also use some error correction/detection coding scheme > as well if you wanted to try and actually recover data that had errors in it > (but that seems a bit much). Another thing I have seen done is that just the > header has FEC on it. This way, you can almost always get the proper length. > Again, not sure if any of this is warranted. > > Will > > On Jul 30, 2018, at 6:44 AM, marko kiiskila wrote: > > > > Hi, > > > >> On Jul 30, 2018, at 1:47 PM, Laczen JMS wrote: > >> > >> Hi Marko and Will, > >> > >> I have been studying fcb and I think you can leave it at 8 bit as it > >> was. The crc can only be interpreted as a check that > >> the closing was done properly. As soon as the crc check fails this > >> only means that the closing failed. It could just as well > >> be fixed to zero instead of a crc. > > > > That is a valid point. If you can’t trust your data path to flash, what > can you trust? > > > >> > >> When writing errors (bit errors) would occur fcb would need a lot more > >> protection, the length which is written first could > >> no longer be trusted and it would be impossible to find the crc. The > >> writing of the length can also be a more problematic > >> case to solve, what happens when the write of the length fails and the > >> length that is read is pointing outside the sector ? > > > > On the other hand, there are flashes where multiple write cycles to > > same location are allowed; you can keep turning more bits to zero. > > There you can corrupt the data after writing. And on some the default, > > unwritten state is 0 (this we need to address for FCB, image > management). > > > > You’re right; adding mechanisms to detect corruption of length field, > for > > example, starts to get complicated and costly. Recovery is easier to do, > > however, if we use a stronger version of CRC. I.e. if CRC does not > match, > > then just go forward a byte at a time until it does. > > > > > >> > >> Kind regards, > >> > >> Jehudi > >> > >> > >> Op ma 30 jul. 2018 om 11:10 schreef marko kiiskila : > >>> > >>> Thanks for the response, Will. > >>> > >>> I made it one-byte because the scenario where the CRC check strength > comes into play is somewhat rare; > >>> it is to detect partial writes. I.e. when fcb_append_finish() fails > to get called. This, presumably, would > >>> onlly happen on crash/power outage within a specific time window. > This is not used as an error detection > >>> mechanism on a channel where we expect bit errors. > >>> > >>> The way I did it was I added 2 syscfg knobs to control which CRC is > included in the build. In case you get > >>> really tight on code space. Of course, newtmgr uses CRC16, so if you > have that enabled, > >>> there is no gain. > >>> There’s 3 different options when starting FCB: inherit from flash, > force 16 bit, or force 8 bits. If the flash region > >>> has not been initialized with anything, then the ‘
Re: FCB revamp
Interesting; I see what this method does for you. I still think there is a fairly high chance of getting a false positive with a 1 byte crc but that might not be a huge issue. Anyway, thanks for this! Will > On Jul 31, 2018, at 12:47 AM, Laczen JMS wrote: > > Hi Marko and Will, > > I have experienced the same problem while developing nvs for zephyr. I > always got stuck when I would except that there would be changes to > flash by an "external" factor. As soon as this could happen there is a > problem with the meaning of the crc. The crc fails this means: > a. Incomplete write, > b. Some tampering occurred, > c. The len is not pointing to the correct location, > In all these cases the result will be that the data is not considered > valid, the only case where a increased crc might be helping is in b. > > In the latest version of nvs I have changed how the data is written to > flash. The data itself is written from the start of a sector, a > "allocation entry" is written from the end of a flash sector. This > allocation entry contains a sector offset and length of the data > written. The allocation entry is of fixed size. The writing of data > is done using the following method: > 1. Check if there is sufficient space (difference between allocation > entry location and next data location), > 2. Write the data at the specific location, this could include a data crc, > 3. Write the allocation entry, this includes a crc calculated over the > item length and offset, > > When there is insufficient space a allocation entry of all zeros is > written at the end of the allocation entries, and a new sector is > used. > > At startup the allocation entry write position is found by stepping > through the allocation entries until all 0xff are found. The data > write position is found by searching the position from which all data > up to the allocation entry write position is 0xff. > > The disadvantage is of course that the allocation entries take up some > space (only the offset is added). > > KInd regards, > > Jehudi > > Op ma 30 jul. 2018 om 23:16 schreef Cufi, Carles : >> >> + Andrzej >> >> On 30/07/2018, 18:45, "will sanfilippo" wrote: >> >>I guess you could also use some error correction/detection coding scheme >> as well if you wanted to try and actually recover data that had errors in it >> (but that seems a bit much). Another thing I have seen done is that just the >> header has FEC on it. This way, you can almost always get the proper length. >> Again, not sure if any of this is warranted. >> >>Will >>> On Jul 30, 2018, at 6:44 AM, marko kiiskila wrote: >>> >>> Hi, >>> On Jul 30, 2018, at 1:47 PM, Laczen JMS wrote: Hi Marko and Will, I have been studying fcb and I think you can leave it at 8 bit as it was. The crc can only be interpreted as a check that the closing was done properly. As soon as the crc check fails this only means that the closing failed. It could just as well be fixed to zero instead of a crc. >>> >>> That is a valid point. If you can’t trust your data path to flash, what can >>> you trust? >>> When writing errors (bit errors) would occur fcb would need a lot more protection, the length which is written first could no longer be trusted and it would be impossible to find the crc. The writing of the length can also be a more problematic case to solve, what happens when the write of the length fails and the length that is read is pointing outside the sector ? >>> >>> On the other hand, there are flashes where multiple write cycles to >>> same location are allowed; you can keep turning more bits to zero. >>> There you can corrupt the data after writing. And on some the default, >>> unwritten state is 0 (this we need to address for FCB, image management). >>> >>> You’re right; adding mechanisms to detect corruption of length field, for >>> example, starts to get complicated and costly. Recovery is easier to do, >>> however, if we use a stronger version of CRC. I.e. if CRC does not match, >>> then just go forward a byte at a time until it does. >>> >>> Kind regards, Jehudi Op ma 30 jul. 2018 om 11:10 schreef marko kiiskila : > > Thanks for the response, Will. > > I made it one-byte because the scenario where the CRC check strength > comes into play is somewhat rare; > it is to detect partial writes. I.e. when fcb_append_finish() fails to > get called. This, presumably, would > onlly happen on crash/power outage within a specific time window. This is > not used as an error detection > mechanism on a channel where we expect bit errors. > > The way I did it was I added 2 syscfg knobs to control which CRC is > included in the build. In case you get > really tight on code space. Of course, newtmgr uses CRC16, so if you have > that enabled, > there is no gain. > There
Re: FCB revamp
Hi, On 31 Jul 2018, at 0:47, Laczen JMS wrote: Hi Marko and Will, I have experienced the same problem while developing nvs for zephyr. I always got stuck when I would except that there would be changes to flash by an "external" factor. As soon as this could happen there is a problem with the meaning of the crc. The crc fails this means: a. Incomplete write, b. Some tampering occurred, c. The len is not pointing to the correct location, In all these cases the result will be that the data is not considered valid, the only case where a increased crc might be helping is in b. I think (b) is more commonly failure of flash over time, and writes and/or erases that fail. Some times bits will be “stuck” on flashes, especially inexpensive ones in high volume consumer products that have no correction for this. The probability that this will happen across a population of millions of devices is probably pretty high, and the bugs are hard to find. In the latest version of nvs I have changed how the data is written to flash. The data itself is written from the start of a sector, a "allocation entry" is written from the end of a flash sector. This allocation entry contains a sector offset and length of the data written. The allocation entry is of fixed size. The writing of data is done using the following method: 1. Check if there is sufficient space (difference between allocation entry location and next data location), 2. Write the data at the specific location, this could include a data crc, 3. Write the allocation entry, this includes a crc calculated over the item length and offset, When there is insufficient space a allocation entry of all zeros is written at the end of the allocation entries, and a new sector is used. At startup the allocation entry write position is found by stepping through the allocation entries until all 0xff are found. The data write position is found by searching the position from which all data up to the allocation entry write position is 0xff. This is a good way of accomplishing this, another is to use a magic on every entry. It would be interesting to allow streaming of entries using this approach — fixed length entries that contain type, offset, len — perhaps encoded more efficiently. We’ve been discussing adding a flash store interface to Mynewt similar to NVS and one of the requirements is definitely to not require the full amount of RAM to write a larger BLOB entry to the flash. Sterling