[flashrom] Re: RFC: removing 1 second verification delay

2024-05-12 Thread Anastasia Klimchuk
Brian, thank you for your work! The patch is submitted.

> Honestly, I don't know how to move forward on assertions about
> "know[ing] for sure whether this delay is no longer needed". We have
> no actionable information about the original problem, so we can never
> be sure.

I have been thinking about it in the background. I feel it's a good
topic to discuss in the meeting.

-- 
Anastasia.
___
flashrom mailing list -- flashrom@flashrom.org
To unsubscribe send an email to flashrom-le...@flashrom.org


[flashrom] Re: RFC: removing 1 second verification delay

2024-05-06 Thread Brian Norris
On Mon, Apr 29, 2024 at 5:58 AM Anastasia Klimchuk  wrote:
> The patch saves 1s for every flashrom run, for almost all users, so
> it's a good thing. If it can be unblocked now just by adding 1-line
> condition, let's unblock it now.

I've updated the patch to approximately this suggestion.

> Perhaps there is a bigger question of Parallel/LPC/FWH flash chips
> which need special treatment, but rarely used nowadays. I want to
> discuss, but as a topic by itself.

Honestly, I don't know how to move forward on assertions about
"know[ing] for sure whether this delay is no longer needed". We have
no actionable information about the original problem, so we can never
be sure.

So for my purposes, I'll just drop the subject after this patch.

> I don't think we need to resolve all of the problems in one patch :)

Great!
___
flashrom mailing list -- flashrom@flashrom.org
To unsubscribe send an email to flashrom-le...@flashrom.org


[flashrom] Re: RFC: removing 1 second verification delay

2024-04-29 Thread Anastasia Klimchuk
Currently, BUS_NONSPI = BUS_PARALLEL | BUS_LPC | BUS_FWH

so it does not include BUS_PROG, and of course not BUS_SPI.
It seems to cover what is needed?

I also need to correct my little piece of code, it actually should be
if (flash->chip->bustype & BUS_NONSPI)
It seems like something everyone can agree on?

The patch saves 1s for every flashrom run, for almost all users, so
it's a good thing. If it can be unblocked now just by adding 1-line
condition, let's unblock it now.

Perhaps there is a bigger question of Parallel/LPC/FWH flash chips
which need special treatment, but rarely used nowadays. I want to
discuss, but as a topic by itself.

I don't think we need to resolve all of the problems in one patch :)

On Fri, Apr 26, 2024 at 8:54 AM Brian Norris  wrote:
>
> Hi Angel,
>
> On Thu, Apr 25, 2024 at 11:44:47AM +, Angel Pons wrote:
> > On Thu, Apr 25, 2024 at 10:26 AM Anastasia Klimchuk  
> > wrote:
>
> > /*
> >  * Work around chips which need some time to calm down.
> >  * Not needed for SPI flash chips because of the bus' strict timing.
> >  */
> > if (flash->chip->bustype != BUS_SPI)
> > programmer_delay(flashctx, 1000*1000);
>
> For the record, linux_mtd is a primary target for all Arm Chromebooks,
> and it registers an opaque programme with BUG_PROG. Presumably an
> "opaque" chip qualifies as "safe enough" too, because the opaque
> abstraction should be taking care of timing details? If not, then this
> is not a sufficient solution for me.
>
> I guess my proposal would probably enumerate the old buses
> (BUS_PARALLEL, BUS_LPC, BUS_FWH) specifically, to avoid further
> unintentional inclusion.
>
> Anyway, I appreciate the general thrust of your thoughts and
> suggestions. I'll reply to a few bits below.
>
> > [The following is a textual representation of my thought processes;
> > it's not particularly necessary but I felt it could be interesting]
>
> Appreciated!
>
> > My idea is to maintain backwards compatibility while still enabling
> > new features and improvements. [...]
>
> I appreciate the attempt at a middle ground here.
>
> > > On Thu, Apr 25, 2024 at 7:35 AM Angel Pons  wrote:
> > > > On Wed, Apr 24, 2024 at 9:16 PM Brian Norris  
> > > > wrote:
> > > > >
> > > > > Background: https://review.coreboot.org/c/flashrom/+/80807
> > > > >
> > > > > A long time ago, in the pre-git times [1], flashrom added a 1 second
> > > > > delay to all verification, and claimed that some chips "need some time
> > > > > to calm down." The commit message claims it "fixes a few reports where
> > > > > verify directly after erase had unpleasant side effects like
> > > > > corrupting flash or at least getting incorrect verify results." It
> > > > > provides no details of what systems, chips, or programmers were
> > > > > involved.
> > > >
> > > > Back then, SPI flash chips weren't as ubiquitous as they currently
> > > > are. This workaround is most likely for Parallel/LPC/FWH flash chips,
> > > > which can actually be quite weird.
>
> Interesting callout. I've dealt with some parallel NOR as well in the
> past, and I don't really recall magical sleeps there either. Look around
> at Linux's drivers/mtd/chips/cfi* for example. There's some incidence of
> udelay()/msleep()/etc., but they're coupled with status checks and
> polling loops, similar to any SPI protocol. The only exception I found
> is a verbosely documented cfi_fixup_m29ew_delay_after_resume() sleep.
> And even that's only 0.5 milliseconds.
>
> But I suppose the existence of good parallel flash drivers doesn't
> preclude the existence of poor ones elsewhere.
>
> Do you have any kind of examples of what "weirdness" might be present
> here? Or is this just a general feeling and guess based on the year of
> the original change?
>
> > > > If we don't know for sure whether this delay is no longer needed, I
> > > > wouldn't risk re-introducing issues in flashrom.
>
> I don't think that condition is possible to satisfy given sufficiently
> under-documented "fixes" from ancient history. I appreciate that we
> don't want to regress things, but when there's no evidence or details of
> the initial problem, it's logically impossible to prove its
> non-existence now.
>
> But toward a non-guaranteed proof: see consideration #3 in my
> aforementioned patch. We already perform verify-after-erase steps
> without any such delay. Wouldn't this path be affected in the same way,
> if it was still a real problem? Similarly, we have no such protection
> against a sequence of `flashrom --noverify --write; flashrom --read`. So
> even if the delay was serving some purpose, it seems a wholly
> insufficient one that feels little better than a band-aid.
>
> Still, maybe y'all consider the band-aid better than ripping it off. And
> if so, I'll consider incorporating your suggestion.
>
> > > > Do Chromebooks use non-SPI flash chips? They probably don't.
>
> Not that I know of.
>
> > > > Did you conside making it so that only SPI programmers / flash chips
> > > > skip the 

[flashrom] Re: RFC: removing 1 second verification delay

2024-04-25 Thread Brian Norris
Hi Angel,

On Thu, Apr 25, 2024 at 11:44:47AM +, Angel Pons wrote:
> On Thu, Apr 25, 2024 at 10:26 AM Anastasia Klimchuk  wrote:

> /*
>  * Work around chips which need some time to calm down.
>  * Not needed for SPI flash chips because of the bus' strict timing.
>  */
> if (flash->chip->bustype != BUS_SPI)
> programmer_delay(flashctx, 1000*1000);

For the record, linux_mtd is a primary target for all Arm Chromebooks,
and it registers an opaque programme with BUG_PROG. Presumably an
"opaque" chip qualifies as "safe enough" too, because the opaque
abstraction should be taking care of timing details? If not, then this
is not a sufficient solution for me.

I guess my proposal would probably enumerate the old buses
(BUS_PARALLEL, BUS_LPC, BUS_FWH) specifically, to avoid further
unintentional inclusion.

Anyway, I appreciate the general thrust of your thoughts and
suggestions. I'll reply to a few bits below.

> [The following is a textual representation of my thought processes;
> it's not particularly necessary but I felt it could be interesting]

Appreciated!

> My idea is to maintain backwards compatibility while still enabling
> new features and improvements. [...]

I appreciate the attempt at a middle ground here.

> > On Thu, Apr 25, 2024 at 7:35 AM Angel Pons  wrote:
> > > On Wed, Apr 24, 2024 at 9:16 PM Brian Norris  
> > > wrote:
> > > >
> > > > Background: https://review.coreboot.org/c/flashrom/+/80807
> > > >
> > > > A long time ago, in the pre-git times [1], flashrom added a 1 second
> > > > delay to all verification, and claimed that some chips "need some time
> > > > to calm down." The commit message claims it "fixes a few reports where
> > > > verify directly after erase had unpleasant side effects like
> > > > corrupting flash or at least getting incorrect verify results." It
> > > > provides no details of what systems, chips, or programmers were
> > > > involved.
> > >
> > > Back then, SPI flash chips weren't as ubiquitous as they currently
> > > are. This workaround is most likely for Parallel/LPC/FWH flash chips,
> > > which can actually be quite weird.

Interesting callout. I've dealt with some parallel NOR as well in the
past, and I don't really recall magical sleeps there either. Look around
at Linux's drivers/mtd/chips/cfi* for example. There's some incidence of
udelay()/msleep()/etc., but they're coupled with status checks and
polling loops, similar to any SPI protocol. The only exception I found
is a verbosely documented cfi_fixup_m29ew_delay_after_resume() sleep.
And even that's only 0.5 milliseconds.

But I suppose the existence of good parallel flash drivers doesn't
preclude the existence of poor ones elsewhere.

Do you have any kind of examples of what "weirdness" might be present
here? Or is this just a general feeling and guess based on the year of
the original change?

> > > If we don't know for sure whether this delay is no longer needed, I
> > > wouldn't risk re-introducing issues in flashrom.

I don't think that condition is possible to satisfy given sufficiently
under-documented "fixes" from ancient history. I appreciate that we
don't want to regress things, but when there's no evidence or details of
the initial problem, it's logically impossible to prove its
non-existence now.

But toward a non-guaranteed proof: see consideration #3 in my
aforementioned patch. We already perform verify-after-erase steps
without any such delay. Wouldn't this path be affected in the same way,
if it was still a real problem? Similarly, we have no such protection
against a sequence of `flashrom --noverify --write; flashrom --read`. So
even if the delay was serving some purpose, it seems a wholly
insufficient one that feels little better than a band-aid.

Still, maybe y'all consider the band-aid better than ripping it off. And
if so, I'll consider incorporating your suggestion.

> > > Do Chromebooks use non-SPI flash chips? They probably don't.

Not that I know of.

> > > Did you conside making it so that only SPI programmers / flash chips
> > > skip the delay? The SPI bus' strict timing leaves no room for this
> > > problem to occur, so it should be safe to skip this delay. And this
> > > would keep non-SPI unaffected (which is most likely what needs this
> > > extra delay).

I did not consider this yet, but I'm considering it now! If this is the
best semi-conservative solution we can come up with, I suppose I'm fine
with that. Per the above, I don't think non-SPI is relevant to anything
I care about, so maybe it's good enough.

Thanks for your thoughts,
Brian
___
flashrom mailing list -- flashrom@flashrom.org
To unsubscribe send an email to flashrom-le...@flashrom.org


[flashrom] Re: RFC: removing 1 second verification delay

2024-04-25 Thread Angel Pons
Hi Anastasia, list,

On Thu, Apr 25, 2024 at 10:26 AM Anastasia Klimchuk  wrote:
>
> Angel, great to see you here again! Also, you provided very useful 
> information.
>
> > Did you conside making it so that only SPI programmers / flash chips
> > skip the delay? The SPI bus' strict timing leaves no room for this
> > problem to occur, so it should be safe to skip this delay. And this
> > would keep non-SPI unaffected (which is most likely what needs this
> > extra delay).
>
> This sounds interesting.
> If we check for non-SPI chips, it only adds one more line to the patch
> (if I understand your idea correctly):
>
> if (flash->chip->bustype == BUS_NONSPI)

I haven't looked into the specifics at all, but that's the idea. It
would be good to test that the delay is still executed on non-SPI
flash chips. I think `BUS_NONSPI` actually encompasses multiple buses,
so a simple equality test might not work. I would try something like
this:

/*
 * Work around chips which need some time to calm down.
 * Not needed for SPI flash chips because of the bus' strict timing.
 */
if (flash->chip->bustype != BUS_SPI)
programmer_delay(flashctx, 1000*1000);

[The following is a textual representation of my thought processes;
it's not particularly necessary but I felt it could be interesting]

My idea is to maintain backwards compatibility while still enabling
new features and improvements. To do this, my very general approach is
to look for "cut-off points", i.e. some criteria to differentiate
between "preserve existing behaviour to avoid regressions" and "add
new features and/or make existing stuff better", seeking to maximise
the extent of the improvements (e.g. improving SPI support in general
instead of artificially restricting the improvements to a few
chips/programmers) while also trying to minimise the chance of
regressions (which can be fatal in the case of flashrom: some LPC/FWH
flash chips are TSOP32, externally reflashing them sounds very
not-fun). Analytically solving this is most likely impossible, but for
specific cases it's possible to find a reasonably acceptable ("good
enough") balance with the usual problem-solving tools: reason (e.g.
check against specifications, test things, etc.), intuition (useful as
a heuristic to get "directions" to explore further), and communication
(e.g. this very mailing list thread).

In this case (removing the delay in the verify path), my thoughts were
(not necessarily in order):
- I feel this delay was introduced for a good reason (even though I
have no idea what the reason is), so removing it will probably break
something.
- But do we have to remove the delay for *everything*, or can we
conditionally skip it in some circumstances? This isn't a strict
yes/no scenario.
- Looks like the messages proposing this change pretty much only talk
about flashing SPI chips, maybe this delay relates to the bus somehow.
Plus, the delay was introduced a long time ago.
- The SPI bus has very strict timing requirements (emulating SPI flash
isn't easy). However, Non-SPI flash chips don't have the same
requirements (older laptop ECs used SPI flash chips, but exposed the
flash contents to the chipset over LPC).
- It's likely this delay isn't needed for SPI because of all of this,
so let's consider skipping the delay for SPI. It's what most (but not
all) people use these flashrom for these days.
- Even if some non-SPI chips don't need the delay, it's much harder to
test this. Plus, keeping the delay in there for non-SPI won't hurt.

> On Thu, Apr 25, 2024 at 7:35 AM Angel Pons  wrote:
> >
> > Hi Brian, list,
> >
> > Thanks for bringing this up on the mailing list.
> >
> > On Wed, Apr 24, 2024 at 9:16 PM Brian Norris  
> > wrote:
> > >
> > > Background: https://review.coreboot.org/c/flashrom/+/80807
> > >
> > > A long time ago, in the pre-git times [1], flashrom added a 1 second
> > > delay to all verification, and claimed that some chips "need some time
> > > to calm down." The commit message claims it "fixes a few reports where
> > > verify directly after erase had unpleasant side effects like
> > > corrupting flash or at least getting incorrect verify results." It
> > > provides no details of what systems, chips, or programmers were
> > > involved.
> >
> > Back then, SPI flash chips weren't as ubiquitous as they currently
> > are. This workaround is most likely for Parallel/LPC/FWH flash chips,
> > which can actually be quite weird.
> >
> > > This delay remains in the --verify path today, and IMO, it's a big
> > > waste of time. If there are truly chips that are, say, deasserting the
> > > BUSY line before they're truly finished programming ... well, we
> > > should track those chips down and add targeted quirk flags for them.
> > > We shouldn't penalize all flashrom users in all cases for all time.
> >
> > I agree that this delay shouldn't be unconditional.
> >
> > > Personally, I highly doubt that this delay is still relevant today --
> > > there may have been a bug in some programmer 

[flashrom] Re: RFC: removing 1 second verification delay

2024-04-25 Thread Anastasia Klimchuk
Angel, great to see you here again! Also, you provided very useful information.

> Did you conside making it so that only SPI programmers / flash chips
> skip the delay? The SPI bus' strict timing leaves no room for this
> problem to occur, so it should be safe to skip this delay. And this
> would keep non-SPI unaffected (which is most likely what needs this
> extra delay).

This sounds interesting.
If we check for non-SPI chips, it only adds one more line to the patch
(if I understand your idea correctly):

if (flash->chip->bustype == BUS_NONSPI)

On Thu, Apr 25, 2024 at 7:35 AM Angel Pons  wrote:
>
> Hi Brian, list,
>
> Thanks for bringing this up on the mailing list.
>
> On Wed, Apr 24, 2024 at 9:16 PM Brian Norris  wrote:
> >
> > Background: https://review.coreboot.org/c/flashrom/+/80807
> >
> > A long time ago, in the pre-git times [1], flashrom added a 1 second
> > delay to all verification, and claimed that some chips "need some time
> > to calm down." The commit message claims it "fixes a few reports where
> > verify directly after erase had unpleasant side effects like
> > corrupting flash or at least getting incorrect verify results." It
> > provides no details of what systems, chips, or programmers were
> > involved.
>
> Back then, SPI flash chips weren't as ubiquitous as they currently
> are. This workaround is most likely for Parallel/LPC/FWH flash chips,
> which can actually be quite weird.
>
> > This delay remains in the --verify path today, and IMO, it's a big
> > waste of time. If there are truly chips that are, say, deasserting the
> > BUSY line before they're truly finished programming ... well, we
> > should track those chips down and add targeted quirk flags for them.
> > We shouldn't penalize all flashrom users in all cases for all time.
>
> I agree that this delay shouldn't be unconditional.
>
> > Personally, I highly doubt that this delay is still relevant today --
> > there may have been a bug in some programmer that has since been
> > fixed; there may have been some malfunctioning system that is no
> > longer in use; ... or it's possible this is still hiding a real buggy
> > chip somewhere out there.
>
> If we don't know for sure whether this delay is no longer needed, I
> wouldn't risk re-introducing issues in flashrom.
>
> > I propose: we still remove the delay. There's plenty of description in
> > the above Gerrit link about why, and how we can handle regressions.
> > (For one, it's relatively simple to split up a --verify operation into
> > its constituent --write/sleep/--read operations, for debugging.)
> >
> > The request:
> > 1. Tests: I've tested a few Chromebooks, but imagine this had to have
> > been some more esoteric system. Extra testing is welcome.
>
> Do Chromebooks use non-SPI flash chips? They probably don't.
>
> > 2. Thoughts: does my proposal make sense? Am I missing something obvious?
>
> Did you conside making it so that only SPI programmers / flash chips
> skip the delay? The SPI bus' strict timing leaves no room for this
> problem to occur, so it should be safe to skip this delay. And this
> would keep non-SPI unaffected (which is most likely what needs this
> extra delay).
>
> > 3. Awareness: if nothing else, this email may serve to highlight in
> > case we land the above patch and later hear back that there are some
> > sketchy --verify reports from users.
>
> I agree that awareness is important, but bear in mind that external
> programmers for non-SPI flash chips are increasingly rare, so
> recovering from a brick can be quite convoluted (e.g. having to
> desolder a TSOP32 flash chip to reflash it somehow).
>
> > Thanks,
> > Brian
> >
> > [1] The svn->git commit in question:
> >
> >   commit 8ab49e72af8465d4527de2ec37b22cd44f7a1169
> >   Date:   Wed Aug 19 13:55:34 2009 +
> >   Disallow erase/write for known bad chips so people won't try without
> > a clear understanding
> > ___
> > flashrom mailing list -- flashrom@flashrom.org
> > To unsubscribe send an email to flashrom-le...@flashrom.org
>
> Best regards,
> Angel
> ___
> flashrom mailing list -- flashrom@flashrom.org
> To unsubscribe send an email to flashrom-le...@flashrom.org

-- 
Anastasia.
___
flashrom mailing list -- flashrom@flashrom.org
To unsubscribe send an email to flashrom-le...@flashrom.org


[flashrom] Re: RFC: removing 1 second verification delay

2024-04-24 Thread Angel Pons
Hi Brian, list,

Thanks for bringing this up on the mailing list.

On Wed, Apr 24, 2024 at 9:16 PM Brian Norris  wrote:
>
> Background: https://review.coreboot.org/c/flashrom/+/80807
>
> A long time ago, in the pre-git times [1], flashrom added a 1 second
> delay to all verification, and claimed that some chips "need some time
> to calm down." The commit message claims it "fixes a few reports where
> verify directly after erase had unpleasant side effects like
> corrupting flash or at least getting incorrect verify results." It
> provides no details of what systems, chips, or programmers were
> involved.

Back then, SPI flash chips weren't as ubiquitous as they currently
are. This workaround is most likely for Parallel/LPC/FWH flash chips,
which can actually be quite weird.

> This delay remains in the --verify path today, and IMO, it's a big
> waste of time. If there are truly chips that are, say, deasserting the
> BUSY line before they're truly finished programming ... well, we
> should track those chips down and add targeted quirk flags for them.
> We shouldn't penalize all flashrom users in all cases for all time.

I agree that this delay shouldn't be unconditional.

> Personally, I highly doubt that this delay is still relevant today --
> there may have been a bug in some programmer that has since been
> fixed; there may have been some malfunctioning system that is no
> longer in use; ... or it's possible this is still hiding a real buggy
> chip somewhere out there.

If we don't know for sure whether this delay is no longer needed, I
wouldn't risk re-introducing issues in flashrom.

> I propose: we still remove the delay. There's plenty of description in
> the above Gerrit link about why, and how we can handle regressions.
> (For one, it's relatively simple to split up a --verify operation into
> its constituent --write/sleep/--read operations, for debugging.)
>
> The request:
> 1. Tests: I've tested a few Chromebooks, but imagine this had to have
> been some more esoteric system. Extra testing is welcome.

Do Chromebooks use non-SPI flash chips? They probably don't.

> 2. Thoughts: does my proposal make sense? Am I missing something obvious?

Did you conside making it so that only SPI programmers / flash chips
skip the delay? The SPI bus' strict timing leaves no room for this
problem to occur, so it should be safe to skip this delay. And this
would keep non-SPI unaffected (which is most likely what needs this
extra delay).

> 3. Awareness: if nothing else, this email may serve to highlight in
> case we land the above patch and later hear back that there are some
> sketchy --verify reports from users.

I agree that awareness is important, but bear in mind that external
programmers for non-SPI flash chips are increasingly rare, so
recovering from a brick can be quite convoluted (e.g. having to
desolder a TSOP32 flash chip to reflash it somehow).

> Thanks,
> Brian
>
> [1] The svn->git commit in question:
>
>   commit 8ab49e72af8465d4527de2ec37b22cd44f7a1169
>   Date:   Wed Aug 19 13:55:34 2009 +
>   Disallow erase/write for known bad chips so people won't try without
> a clear understanding
> ___
> flashrom mailing list -- flashrom@flashrom.org
> To unsubscribe send an email to flashrom-le...@flashrom.org

Best regards,
Angel
___
flashrom mailing list -- flashrom@flashrom.org
To unsubscribe send an email to flashrom-le...@flashrom.org