Re: [bitcoin-dev] Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks

2022-03-22 Thread ZmnSCPxj via bitcoin-dev
Good morning aj,

> On Tue, Mar 22, 2022 at 05:37:03AM +, ZmnSCPxj via bitcoin-dev wrote:
>
> > Subject: Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks
>
> (Have you considered applying a jit or some other compression algorithm
> to your emails?)
>
> > Microcode For Bitcoin SCRIPT
> >
> > =
> >
> > I propose:
> >
> > -   Define a generic, low-level language (the "RISC language").
>
> This is pretty much what Simplicity does, if you optimise the low-level
> language to minimise the number of primitives and maximise the ability
> to apply tooling to reason about it, which seem like good things for a
> RISC language to optimise.
>
> > -   Define a mapping from a specific, high-level language to
> > the above language (the microcode).
> >
> > -   Allow users to sacrifice Bitcoins to define a new microcode.
>
> I think you're defining "the microcode" as the "mapping" here.

Yes.

>
> This is pretty similar to the suggestion Bram Cohen was making a couple
> of months ago:
>
> https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2021-December/019722.html
> https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2022-January/019773.html
> https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2022-January/019803.html
>
> I believe this is done in chia via the block being able to
> include-by-reference prior blocks' transaction generators:
>
> ] transactions_generator_ref_list: List[uint32]: A list of block heights of 
> previous generators referenced by this block's generator.
>
> -   https://docs.chia.net/docs/05block-validation/block_format
>
> (That approach comes at the cost of not being able to do full validation
> if you're running a pruning node. The alternative is to effectively
> introduce a parallel "utxo" set -- where you're mapping the "sacrificed"
> BTC as the nValue and instead of just mapping it to a scriptPubKey for
> a later spend, you're permanently storing the definition of the new
> CISC opcode)
>
>

Yes, the latter is basically what microcode is.

> > We can then support a "RISC" language that is composed of
> > general instructions, such as arithmetic, SECP256K1 scalar
> > and point math, bytevector concatenation, sha256 midstates,
> > bytevector bit manipulation, transaction introspection, and
> > so on.
>
> A language that includes instructions for each operation we can think
> of isn't very "RISC"... More importantly it gets straight back to the
> "we've got a new zk system / ECC curve / ... that we want to include,
> let's do a softfork" problem you were trying to avoid in the first place.

`libsecp256k1` can run on purely RISC machines like ARM, so saying that a 
"RISC" set of opcodes cannot implement some arbitrary ECC curve, when the 
instruction set does not directly support that ECC curve, seems incorrect.

Any new zk system / ECC curve would have to be implementable in C++, so if you 
have micro-operations that would be needed for it, such as XORing two 
multi-byte vectors together, multiplying multi-byte precision numbers, etc., 
then any new zk system or ECC curve would be implementable in microcode.
For that matter, you could re-write `libsecp256k1` there.

> > Then, the user creates a new transaction where one of
> > the outputs contains, say, 1.0 Bitcoins (exact required
> > value TBD),
>
> Likely, the "fair" price would be the cost of introducing however many
> additional bytes to the utxo set that it would take to represent your
> microcode, and the cost it would take to run jit(your microcode script)
> if that were a validation function. Both seem pretty hard to manage.
>
> "Ideally", I think you'd want to be able to say "this old microcode
> no longer has any value, let's forget it, and instead replace it with
> this new microcode that is much better" -- that way nodes don't have to
> keep around old useless data, and you've reduced the cost of introducing
> new functionality.

Yes, but that invites "I accidentally the smart contract" behavior.

> Additionally, I think it has something of a tragedy-of-the-commons
> problem: whoever creates the microcode pays the cost, but then anyone
> can use it and gain the benefit. That might even end up creating
> centralisation pressure: if you design a highly decentralised L2 system,
> it ends up expensive because people can't coordinate to pay for the
> new microcode that would make it cheaper; but if you design a highly
> centralised L2 system, you can just pay for the microcode yourself and
> make it even cheaper.

The same "tragedy of the commons" applies to FOSS.
"whoever creates the FOSS pays the cost, but then anyone can use it and gain 
the benefit"
This seems like an argument against releasing a FOSS node software.

Remember, microcode is software too, and copying software does not have a 
tragedy of the commons --- the main point of a tragedy of the commons is that 
the commons is *degraded* by the use but nobody has incentive to maintain 

Re: [bitcoin-dev] Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks

2022-03-22 Thread Anthony Towns via bitcoin-dev
On Tue, Mar 22, 2022 at 05:37:03AM +, ZmnSCPxj via bitcoin-dev wrote:
> Subject: Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks

(Have you considered applying a jit or some other compression algorithm
to your emails?)

> Microcode For Bitcoin SCRIPT
> 
> I propose:
> * Define a generic, low-level language (the "RISC language").

This is pretty much what Simplicity does, if you optimise the low-level
language to minimise the number of primitives and maximise the ability
to apply tooling to reason about it, which seem like good things for a
RISC language to optimise.

> * Define a mapping from a specific, high-level language to
>   the above language (the microcode).
> * Allow users to sacrifice Bitcoins to define a new microcode.

I think you're defining "the microcode" as the "mapping" here.

This is pretty similar to the suggestion Bram Cohen was making a couple
of months ago:

https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2021-December/019722.html
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2022-January/019773.html
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2022-January/019803.html

I believe this is done in chia via the block being able to
include-by-reference prior blocks' transaction generators:

] transactions_generator_ref_list: List[uint32]: A list of block heights of 
previous generators referenced by this block's generator.
  - https://docs.chia.net/docs/05block-validation/block_format

(That approach comes at the cost of not being able to do full validation
if you're running a pruning node. The alternative is to effectively
introduce a parallel "utxo" set -- where you're mapping the "sacrificed"
BTC as the nValue and instead of just mapping it to a scriptPubKey for
a later spend, you're permanently storing the definition of the new
CISC opcode)

> We can then support a "RISC" language that is composed of
> general instructions, such as arithmetic, SECP256K1 scalar
> and point math, bytevector concatenation, sha256 midstates,
> bytevector bit manipulation, transaction introspection, and
> so on.

A language that includes instructions for each operation we can think
of isn't very "RISC"... More importantly it gets straight back to the
"we've got a new zk system / ECC curve / ... that we want to include,
let's do a softfork" problem you were trying to avoid in the first place.

> Then, the user creates a new transaction where one of
> the outputs contains, say, 1.0 Bitcoins (exact required
> value TBD),

Likely, the "fair" price would be the cost of introducing however many
additional bytes to the utxo set that it would take to represent your
microcode, and the cost it would take to run jit(your microcode script)
if that were a validation function. Both seem pretty hard to manage.

"Ideally", I think you'd want to be able to say "this old microcode
no longer has any value, let's forget it, and instead replace it with
this new microcode that is much better" -- that way nodes don't have to
keep around old useless data, and you've reduced the cost of introducing
new functionality.

Additionally, I think it has something of a tragedy-of-the-commons
problem: whoever creates the microcode pays the cost, but then anyone
can use it and gain the benefit. That might even end up creating
centralisation pressure: if you design a highly decentralised L2 system,
it ends up expensive because people can't coordinate to pay for the
new microcode that would make it cheaper; but if you design a highly
centralised L2 system, you can just pay for the microcode yourself and
make it even cheaper.

This approach isn't very composable -- if there's a clever opcode
defined in one microcode spec, and another one in some other microcode,
the only way to use both of them in the same transaction is to burn 1
BTC to define a new microcode that includes both of them.

> We want to be able to execute the defined microcode
> faster than expanding an `OP_`-code SCRIPT to a
> `UOP_`-code SCRIPT and having an interpreter loop
> over the `UOP_`-code SCRIPT.
>
> We can use LLVM.

We've not long ago gone to the effort of removing openssl as a consensus
critical dependency; and likewise previously removed bdb.  Introducing a
huge new dependency to the definition of consensus seems like an enormous
step backwards.

This would also mean we'd be stuck at the performance of whatever version
of llvm we initially adopted, as any performance improvements introduced
in later llvm versions would be a hard fork.

> On the other hand, LLVM bugs are compiler bugs and
> the same bugs can hit the static compiler `cc`, too,

"Well, you could hit Achilles in the heel, so really, what's the point
of trying to be invulnerable anywhere else?"

> Then we put a pointer to this compiled function to a
> 256-long array of functions, where the array index is
> the `OP_` code.

That's a 256-long array of functions for each microcode, which increases
the "microcode-utxo" database 

Re: [bitcoin-dev] Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks

2022-03-22 Thread ZmnSCPxj via bitcoin-dev


Good morning again Russell,

> Good morning Russell,
>
> > Thanks for the clarification.
> > You don't think referring to the microcode via its hash, effectively using 
> > 32-byte encoding of opcodes, is still rather long winded?

For that matter, since an entire microcode represents a language (based on the 
current OG Bitcoin SCRIPT language), with a little more coordination, we could 
entirely replace Tapscript versions --- every Tapscript version is a slot for a 
microcode, and the current OG Bitcoin SCRIPT is just the one in slot `0xc2`.
Filled slots cannot be changed, but new microcodes can use some currently-empty 
Tapscript version slot, and have it properly defined in a microcode 
introduction outpoint.

Then indication of a microcode would take only one byte, that is already needed 
currently anyway.

That does limit us to only 255 new microcodes, thus the cost of one microcode 
would have to be a good bit higher.

Again, remember, microcodes represent an entire language that is an extension 
of OG Bitcoin SCRIPT, not individual operations in that language.

Regards,
ZmnSCPxj
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks

2022-03-22 Thread ZmnSCPxj via bitcoin-dev
Good morning Russell,

> Thanks for the clarification.
>
> You don't think referring to the microcode via its hash, effectively using 
> 32-byte encoding of opcodes, is still rather long winded?

A microcode is a *mapping* of `OP_` codes to a variable-length sequence of 
`UOP_` micro-opcodes.
So a microcode hash refers to an entire language of redefined `OP_` codes, not 
each individual opcode in the language.

If it costs 1 Bitcoin to create a new microcode, then there are only 21 million 
possible microcodes, and I think about 50 bits of hash is sufficient to specify 
those with low probability of collision.
We could use a 20-byte RIPEMD . SHA256 instead for 160 bits, that should be 
more than sufficient with enough margin.
Though perhaps it is now easier to deliberately attack...

Also, if you have a common SCRIPT whose non-`OP_PUSH` opcodes are more than say 
32 + 1 bytes (or 20 + 1 if using RIPEMD), and you can fit their equivalent 
`UOP_` codes into the max limit for a *single* opcode, you can save bytes by 
redefining some random `OP_` code into the sequence of all the `UOP_` codes.
You would have a hash reference to the microcode, and a single byte for the 
actual "SCRIPT" which is just a jet of the entire SCRIPT.
Users of multiple *different* such SCRIPTs can band together to define a single 
microcode, mapping their SCRIPTs to different `OP_` codes and sharing the cost 
of defining the new microcode that shortens all their SCRIPTs.

Regards,
ZmnSCPxj
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks

2022-03-22 Thread Russell O'Connor via bitcoin-dev
Thanks for the clarification.

You don't think referring to the microcode via its hash, effectively using
32-byte encoding of opcodes, is still rather long winded?

On Tue, Mar 22, 2022 at 12:23 PM ZmnSCPxj  wrote:

> Good morning Russell,
>
> > Setting aside my thoughts that something like Simplicity would make a
> better platform than Bitcoin Script (due to expression operating on a more
> narrow interface than the entire stack (I'm looking at you OP_DEPTH)) there
> is an issue with namespace management.
> >
> > If I understand correctly, your implication was that once opcodes are
> redefined by an OP_RETURN transaction, subsequent transactions of that
> opcode refer to the new microtransaction.  But then we have a race
> condition between people submitting transactions expecting the outputs to
> refer to the old code and having their code redefined by the time they do
> get confirmed  (or worse having them reorged).
>
> No, use of specific microcodes is opt-in: you have to use a specific
> `0xce` Tapscript version, ***and*** refer to the microcode you want to use
> via the hash of the microcode.
>
> The only race condition is reorging out a newly-defined microcode.
> This can be avoided by waiting for deep confirmation of a newly-defined
> microcode before actually using it.
>
> But once the microcode introduction outpoint of a particular microcode has
> been deeply confirmed, then your Tapscript can refer to the microcode, and
> its meaning does not change.
>
> Fullnodes may need to maintain multiple microcodes, which is why creating
> new microcodes is expensive; they not only require JIT compilation, they
> also require that fullnodes keep an index that cannot have items deleted.
>
>
> The advantage of the microcode scheme is that the size of the SCRIPT can
> be used as a proxy for CPU load  just as it is done for current Bitcoin
> SCRIPT.
> As long as the number of `UOP_` micro-opcodes that an `OP_` code can
> expand to is bounded, and we avoid looping constructs, then the CPU load is
> also bounded and the size of the SCRIPT approximates the amount of
> processing needed, thus microcode does not require a softfork to modify
> weight calculations in the future.
>
> Regards,
> ZmnSCPxj
>
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks

2022-03-22 Thread ZmnSCPxj via bitcoin-dev
Good morning Russell,

> Setting aside my thoughts that something like Simplicity would make a better 
> platform than Bitcoin Script (due to expression operating on a more narrow 
> interface than the entire stack (I'm looking at you OP_DEPTH)) there is an 
> issue with namespace management.
>
> If I understand correctly, your implication was that once opcodes are 
> redefined by an OP_RETURN transaction, subsequent transactions of that opcode 
> refer to the new microtransaction.  But then we have a race condition between 
> people submitting transactions expecting the outputs to refer to the old code 
> and having their code redefined by the time they do get confirmed  (or worse 
> having them reorged).

No, use of specific microcodes is opt-in: you have to use a specific `0xce` 
Tapscript version, ***and*** refer to the microcode you want to use via the 
hash of the microcode.

The only race condition is reorging out a newly-defined microcode.
This can be avoided by waiting for deep confirmation of a newly-defined 
microcode before actually using it.

But once the microcode introduction outpoint of a particular microcode has been 
deeply confirmed, then your Tapscript can refer to the microcode, and its 
meaning does not change.

Fullnodes may need to maintain multiple microcodes, which is why creating new 
microcodes is expensive; they not only require JIT compilation, they also 
require that fullnodes keep an index that cannot have items deleted.


The advantage of the microcode scheme is that the size of the SCRIPT can be 
used as a proxy for CPU load  just as it is done for current Bitcoin SCRIPT.
As long as the number of `UOP_` micro-opcodes that an `OP_` code can expand to 
is bounded, and we avoid looping constructs, then the CPU load is also bounded 
and the size of the SCRIPT approximates the amount of processing needed, thus 
microcode does not require a softfork to modify weight calculations in the 
future.

Regards,
ZmnSCPxj
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks

2022-03-22 Thread Russell O'Connor via bitcoin-dev
Setting aside my thoughts that something like Simplicity would make a
better platform than Bitcoin Script (due to expression operating on a more
narrow interface than the entire stack (I'm looking at you OP_DEPTH)) there
is an issue with namespace management.

If I understand correctly, your implication was that once opcodes are
redefined by an OP_RETURN transaction, subsequent transactions of that
opcode refer to the new microtransaction.  But then we have a race
condition between people submitting transactions expecting the outputs to
refer to the old code and having their code redefined by the time they do
get confirmed  (or worse having them reorged).

I've partially addressed this issue in my Simplicity design where the
commitment of a Simplicity program in a scriptpubkey covers the hash of the
specification of the jets used, which makes commits unambiguously to the
semantics (rightly or wrongly).  But the issue resurfaces at redemption
time where I (currently) have a consensus critical map of codes to jets
that is used to decode the witness data into a Simplicity program.  If one
were to allow this map of codes to jets to be replaced (rather than just
extended) then it would cause redemption to fail, because the hash of the
new jets would no longer match the hash of the jets appearing the the
input's scriptpubkey commitment.  While this is still not good and I don't
recommend it, it is probably better than letting the semantics of your
programs be changed out from under you.

This comment is not meant as an endorsement of ths idea, which is a little
bit out there, at least as far as Bitcoin is concerned. :)

My long term plans are to move this consensus critical map of codes out of
the consensus layer and into the p2p layer where peers can negotiate their
own encodings between each other.  But that plan is also a little bit out
there, and it still doesn't solve the issue of how to weight reused jets,
where weight is still consensus critical.

On Tue, Mar 22, 2022 at 1:37 AM ZmnSCPxj via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> Good morning list,
>
> It is entirely possible that I have gotten into the deep end and am now
> drowning in insanity, but here goes
>
> Subject: Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks
>
> Introduction
> 
>
> Recent (Early 2022) discussions on the bitcoin-dev mailing
> list have largely focused on new constructs that enable new
> functionality.
>
> One general idea can be summarized this way:
>
> * We should provide a very general language.
>   * Then later, once we have learned how to use this language,
> we can softfork in new opcodes that compress sections of
> programs written in this general language.
>
> There are two arguments against this style:
>
> 1.  One of the most powerful arguments the "general" side of
> the "general v specific" debate is that softforks are
> painful because people are going to keep reiterating the
> activation parameters debate in a memoryless process, so
> we want to keep the number of softforks low.
> * So, we should just provide a very general language and
>   never softfork in any other change ever again.
> 2.  One of the most powerful arguments the "general" side of
> the "general v specific" debate is that softforks are
> painful because people are going to keep reiterating the
> activation parameters debate in a memoryless process, so
> we want to keep the number of softforks low.
> * So, we should just skip over the initial very general
>   language and individually activate small, specific
>   constructs, reducing the needed softforks by one.
>
> By taking a page from microprocessor design, it seems to me
> that we can use the same above general idea (a general base
> language where we later "bless" some sequence of operations)
> while avoiding some of the arguments against it.
>
> Digression: Microcodes In CISC Microprocessors
> --
>
> In the 1980s and 1990s, two competing microprocessor design
> paradigms arose:
>
> * Complex Instruction Set Computing (CISC)
>   - Few registers, many addressing/indexing modes, variable
> instruction length, many obscure instructions.
> * Reduced Instruction Set Computing (RISC)
>   - Many registers, usually only immediate and indexed
> addressing modes, fixed instruction length, few
> instructions.
>
> In CISC, the microprocessor provides very application-specific
> instructions, often with a small number of registers with
> specific uses.
> The instruction set was complicated, and often required
> multiple specific circuits for each application-specific
> instruction.
> Instructions had varying sizes and varying number of cycles.
>
> In RISC, the micrprocessor provides fewer instructions, and
> programmers (or compilers) are supposed to generate the code
> for all application-specific needs.
> The processor provided large 

[bitcoin-dev] Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks

2022-03-21 Thread ZmnSCPxj via bitcoin-dev
Good morning list,

It is entirely possible that I have gotten into the deep end and am now 
drowning in insanity, but here goes

Subject: Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks

Introduction


Recent (Early 2022) discussions on the bitcoin-dev mailing
list have largely focused on new constructs that enable new
functionality.

One general idea can be summarized this way:

* We should provide a very general language.
  * Then later, once we have learned how to use this language,
we can softfork in new opcodes that compress sections of
programs written in this general language.

There are two arguments against this style:

1.  One of the most powerful arguments the "general" side of
the "general v specific" debate is that softforks are
painful because people are going to keep reiterating the
activation parameters debate in a memoryless process, so
we want to keep the number of softforks low.
* So, we should just provide a very general language and
  never softfork in any other change ever again.
2.  One of the most powerful arguments the "general" side of
the "general v specific" debate is that softforks are
painful because people are going to keep reiterating the
activation parameters debate in a memoryless process, so
we want to keep the number of softforks low.
* So, we should just skip over the initial very general
  language and individually activate small, specific
  constructs, reducing the needed softforks by one.

By taking a page from microprocessor design, it seems to me
that we can use the same above general idea (a general base
language where we later "bless" some sequence of operations)
while avoiding some of the arguments against it.

Digression: Microcodes In CISC Microprocessors
--

In the 1980s and 1990s, two competing microprocessor design
paradigms arose:

* Complex Instruction Set Computing (CISC)
  - Few registers, many addressing/indexing modes, variable
instruction length, many obscure instructions.
* Reduced Instruction Set Computing (RISC)
  - Many registers, usually only immediate and indexed
addressing modes, fixed instruction length, few
instructions.

In CISC, the microprocessor provides very application-specific
instructions, often with a small number of registers with
specific uses.
The instruction set was complicated, and often required
multiple specific circuits for each application-specific
instruction.
Instructions had varying sizes and varying number of cycles.

In RISC, the micrprocessor provides fewer instructions, and
programmers (or compilers) are supposed to generate the code
for all application-specific needs.
The processor provided large register banks which could be
used very generically and interchangeably.
Instructions had the same size and every instruction took a
fixed number of cycles.

In CISC you usually had shorter code which could be written
by human programmers in assembly language or machine language.
In RISC, you generally had longer code, often difficult for
human programmers to write, and you *needed* a compiler to
generate it (unless you were very careful, or insane enough
you could scroll over multiple pages of instructions without
becoming more insane), or else you might forget about stuff
like jump slots.

For the most part, RISC lost, since most modern processors
today are x86 or x86-64, an instruction set with varying
instruction sizes, varying number of cycles per instruction,
and complex instructions with application-specific uses.

Or at least, it *looks like* RISC lost.
In the 90s, Intel was struggling since their big beefy CISC
designs were becoming too complicated.
Bugs got past testing and into mass-produced silicon.
RISC processors were beating the pants off 386s in terms of
raw number of computations per second.

RISC processors had the major advantage that they were
inherently simpler, due to having fewer specific circuits
and filling up their silicon with general-purpose registers
(which are large but very simple circuits) to compensate.
This meant that processor designers could fit more of the
design in their merely human meat brains, and were less
likely to make mistakes.
The fixed number of cycles per instruction made it trivial
to create a fixed-length pipeline for instruction processing,
and practical RISC processors could deliver one instruction
per clock cycle.
Worse, the simplicity of RISC meant that smaller and less
experienced teams could produce viable competitors to the
Intel x86s.

So what Intel did was to use a RISC processor, and add a
special Instruction Decoder unit.
The Instruction Decoder would take the CISC instruction
stream accepted by classic Intel x86 processors, and emit
RISC instructions for the internal RISC processor.
CISC instructions might be variable length and have variable
number of cycles, but the emitted RISC instructions were
individually