Re: FCB revamp

2018-07-31 Thread Laczen JMS
Hi Marko and Will,

I have experienced the same problem while developing nvs for zephyr. I
always got stuck when I would except that there would be changes to
flash by an "external" factor. As soon as this could happen there is a
problem with the meaning of the crc. The crc fails this means:
a. Incomplete write,
b. Some tampering occurred,
c. The len is not pointing to the correct location,
In all these cases the result will be that the data is not considered
valid, the only case where a increased crc might be helping is in b.

In the latest version of nvs I have changed how the data is written to
flash. The data itself is written from the start of a sector, a
"allocation entry" is written from the end of a flash sector. This
allocation entry contains a sector offset and length of the data
written.  The allocation entry is of fixed size. The writing of data
is done using the following method:
1. Check if there is sufficient space (difference between allocation
entry location and next data location),
2. Write the data at the specific location, this could include a data crc,
3. Write the allocation entry, this includes a crc calculated over the
item length and offset,

When there is insufficient space a allocation entry of all zeros is
written at the end of the allocation entries, and a new sector is
used.

At startup the allocation entry write position is found by stepping
through the allocation entries until all 0xff are found. The data
write position is found by searching the position from which all data
up to the allocation entry write position is 0xff.

The disadvantage is of course that the allocation entries take up some
space (only the offset is added).

KInd regards,

Jehudi

Op ma 30 jul. 2018 om 23:16 schreef Cufi, Carles :
>
> + Andrzej
>
> On 30/07/2018, 18:45, "will sanfilippo"  wrote:
>
> I guess you could also use some error correction/detection coding scheme 
> as well if you wanted to try and actually recover data that had errors in it 
> (but that seems a bit much). Another thing I have seen done is that just the 
> header has FEC on it. This way, you can almost always get the proper length. 
> Again, not sure if any of this is warranted.
>
> Will
> > On Jul 30, 2018, at 6:44 AM, marko kiiskila  wrote:
> >
> > Hi,
> >
> >> On Jul 30, 2018, at 1:47 PM, Laczen JMS  wrote:
> >>
> >> Hi Marko and Will,
> >>
> >> I have been studying fcb and I think you can leave it at 8 bit as it
> >> was. The crc can only be interpreted as a check that
> >> the closing was done properly. As soon as the crc check fails this
> >> only means that the closing failed. It could just as well
> >> be fixed to zero instead of a crc.
> >
> > That is a valid point. If you can’t trust your data path to flash, what 
> can you trust?
> >
> >>
> >> When writing errors (bit errors) would occur fcb would need a lot more
> >> protection, the  length which is written first could
> >> no longer be trusted and it would be impossible to find the crc. The
> >> writing of the length can also be a more problematic
> >> case to solve, what happens when the write of the length fails and the
> >> length that is read is pointing outside the sector ?
> >
> > On the other hand, there are flashes where multiple write cycles to
> > same location are allowed; you can keep turning more bits to zero.
> > There you can corrupt the data after writing. And on some the default,
> > unwritten state is 0 (this we need to address for FCB, image 
> management).
> >
> > You’re right; adding mechanisms to detect corruption of length field, 
> for
> > example, starts to get complicated and costly. Recovery is easier to do,
> > however, if we use a stronger version of CRC. I.e. if CRC does not 
> match,
> > then just go forward a byte at a time until it does.
> >
> >
> >>
> >> Kind regards,
> >>
> >> Jehudi
> >>
> >>
> >> Op ma 30 jul. 2018 om 11:10 schreef marko kiiskila :
> >>>
> >>> Thanks for the response, Will.
> >>>
> >>> I made it one-byte because the scenario where the CRC check strength 
> comes into play is somewhat rare;
> >>> it is to detect partial writes. I.e. when fcb_append_finish() fails 
> to get called. This, presumably, would
> >>> onlly happen on crash/power outage within a specific time window. 
> This is not used as an error detection
> >>> mechanism on a channel where we expect bit errors.
> >>>
> >>> The way I did it was I added 2 syscfg knobs to control which CRC is 
> included in the build. In case you get
> >>> really tight on code space. Of course, newtmgr uses CRC16, so if you 
> have that enabled,
> >>> there is no gain.
> >>> There’s 3 different options when starting FCB: inherit from flash, 
> force 16 bit, or force 8 bits. If the flash region
> >>> has not been initialized with anything, then the ‘

Re: FCB revamp

2018-07-31 Thread will sanfilippo
Interesting; I see what this method does for you. I still think there is a 
fairly high chance of getting a false positive with a 1 byte crc but that might 
not be a huge issue.

Anyway, thanks for this!

Will

> On Jul 31, 2018, at 12:47 AM, Laczen JMS  wrote:
> 
> Hi Marko and Will,
> 
> I have experienced the same problem while developing nvs for zephyr. I
> always got stuck when I would except that there would be changes to
> flash by an "external" factor. As soon as this could happen there is a
> problem with the meaning of the crc. The crc fails this means:
> a. Incomplete write,
> b. Some tampering occurred,
> c. The len is not pointing to the correct location,
> In all these cases the result will be that the data is not considered
> valid, the only case where a increased crc might be helping is in b.
> 
> In the latest version of nvs I have changed how the data is written to
> flash. The data itself is written from the start of a sector, a
> "allocation entry" is written from the end of a flash sector. This
> allocation entry contains a sector offset and length of the data
> written.  The allocation entry is of fixed size. The writing of data
> is done using the following method:
> 1. Check if there is sufficient space (difference between allocation
> entry location and next data location),
> 2. Write the data at the specific location, this could include a data crc,
> 3. Write the allocation entry, this includes a crc calculated over the
> item length and offset,
> 
> When there is insufficient space a allocation entry of all zeros is
> written at the end of the allocation entries, and a new sector is
> used.
> 
> At startup the allocation entry write position is found by stepping
> through the allocation entries until all 0xff are found. The data
> write position is found by searching the position from which all data
> up to the allocation entry write position is 0xff.
> 
> The disadvantage is of course that the allocation entries take up some
> space (only the offset is added).
> 
> KInd regards,
> 
> Jehudi
> 
> Op ma 30 jul. 2018 om 23:16 schreef Cufi, Carles :
>> 
>> + Andrzej
>> 
>> On 30/07/2018, 18:45, "will sanfilippo"  wrote:
>> 
>>I guess you could also use some error correction/detection coding scheme 
>> as well if you wanted to try and actually recover data that had errors in it 
>> (but that seems a bit much). Another thing I have seen done is that just the 
>> header has FEC on it. This way, you can almost always get the proper length. 
>> Again, not sure if any of this is warranted.
>> 
>>Will
>>> On Jul 30, 2018, at 6:44 AM, marko kiiskila  wrote:
>>> 
>>> Hi,
>>> 
 On Jul 30, 2018, at 1:47 PM, Laczen JMS  wrote:
 
 Hi Marko and Will,
 
 I have been studying fcb and I think you can leave it at 8 bit as it
 was. The crc can only be interpreted as a check that
 the closing was done properly. As soon as the crc check fails this
 only means that the closing failed. It could just as well
 be fixed to zero instead of a crc.
>>> 
>>> That is a valid point. If you can’t trust your data path to flash, what can 
>>> you trust?
>>> 
 
 When writing errors (bit errors) would occur fcb would need a lot more
 protection, the  length which is written first could
 no longer be trusted and it would be impossible to find the crc. The
 writing of the length can also be a more problematic
 case to solve, what happens when the write of the length fails and the
 length that is read is pointing outside the sector ?
>>> 
>>> On the other hand, there are flashes where multiple write cycles to
>>> same location are allowed; you can keep turning more bits to zero.
>>> There you can corrupt the data after writing. And on some the default,
>>> unwritten state is 0 (this we need to address for FCB, image management).
>>> 
>>> You’re right; adding mechanisms to detect corruption of length field, for
>>> example, starts to get complicated and costly. Recovery is easier to do,
>>> however, if we use a stronger version of CRC. I.e. if CRC does not match,
>>> then just go forward a byte at a time until it does.
>>> 
>>> 
 
 Kind regards,
 
 Jehudi
 
 
 Op ma 30 jul. 2018 om 11:10 schreef marko kiiskila :
> 
> Thanks for the response, Will.
> 
> I made it one-byte because the scenario where the CRC check strength 
> comes into play is somewhat rare;
> it is to detect partial writes. I.e. when fcb_append_finish() fails to 
> get called. This, presumably, would
> onlly happen on crash/power outage within a specific time window. This is 
> not used as an error detection
> mechanism on a channel where we expect bit errors.
> 
> The way I did it was I added 2 syscfg knobs to control which CRC is 
> included in the build. In case you get
> really tight on code space. Of course, newtmgr uses CRC16, so if you have 
> that enabled,
> there is no gain.
> There

Re: FCB revamp

2018-07-31 Thread Sterling Hughes

Hi,

On 31 Jul 2018, at 0:47, Laczen JMS wrote:


Hi Marko and Will,

I have experienced the same problem while developing nvs for zephyr. I
always got stuck when I would except that there would be changes to
flash by an "external" factor. As soon as this could happen there is a
problem with the meaning of the crc. The crc fails this means:
a. Incomplete write,
b. Some tampering occurred,
c. The len is not pointing to the correct location,
In all these cases the result will be that the data is not considered
valid, the only case where a increased crc might be helping is in b.


I think (b) is more commonly failure of flash over time, and writes 
and/or erases that fail.  Some times bits will be “stuck” on 
flashes, especially inexpensive ones in high volume consumer products 
that have no correction for this.  The probability that this will happen 
across a population of millions of devices is probably pretty high, and 
the bugs are hard to find.




In the latest version of nvs I have changed how the data is written to
flash. The data itself is written from the start of a sector, a
"allocation entry" is written from the end of a flash sector. This
allocation entry contains a sector offset and length of the data
written.  The allocation entry is of fixed size. The writing of data
is done using the following method:
1. Check if there is sufficient space (difference between allocation
entry location and next data location),
2. Write the data at the specific location, this could include a data 
crc,

3. Write the allocation entry, this includes a crc calculated over the
item length and offset,

When there is insufficient space a allocation entry of all zeros is
written at the end of the allocation entries, and a new sector is
used.

At startup the allocation entry write position is found by stepping
through the allocation entries until all 0xff are found. The data
write position is found by searching the position from which all data
up to the allocation entry write position is 0xff.



This is a good way of accomplishing this, another is to use a magic on 
every entry.  It would be interesting to allow streaming of entries 
using this approach — fixed length entries that contain type, offset, 
len — perhaps encoded more efficiently.  We’ve been discussing 
adding a flash store interface to Mynewt similar to NVS and one of the 
requirements is definitely to not require the full amount of RAM to 
write a larger BLOB entry to the flash.


Sterling