Re: [Bitcoin-development] network disruption as a service and proof of local storage

2015-03-31 Thread Sergio Lerner
Matt is right:  the goal is to prove digital copies of a public file.
Nothing more, nothing less.

Regarding the IP, I don't claim that every machine should provide the
protocol. Mobiles phones shouldn't. But machines that what to be
prioritized in some way or that want to be rewarded for hosting a node
should use a fixed IP. That's the cost of prioritization/reward. The
protocol could be a service bit, advertised in the version message.

My response to your comment below:

On 27/03/2015 03:40 p.m., Jeremy Spilman wrote:

 It would be extremely impressive to achieve a reliable mechanism for 
 discerning a local copy exists under these constraints, particularly without 
 false positives and false negatives, and without imposing very substantial 
 one-time encoding costs, e.g. on par with doubling the verification cost. 
I see it differently. The asymmetric-time protocol is quite reliable. If
can be made to have almost no false positives/false negatives (not
considering rare communication problems, such as congestion and packet
loss for more than 5 seconds).
These are my back-of-the-envelope calculations:
Bitcoind takes approximately 1 second to serve a 1 Mb block (seek time,
but mostly transfer time)
Then decryption of a block can take 150 msec without problem (15%
overhead). The last N blocks could be cached so they don't need to be
decrypted to be sent.
In 150 msec a PC can decrypt a 1MB of data split over 1024-bit blocks
decrypted by modexp 3 (0.2 msec for 3 bigint multiplications), so a full
block can be decrypted.
Encrypting such block would take approximately 15 seconds (which is much
less than the 10 minutes available to encrypt each block)
Then the protocol works with a security margin of approximately 50x.
A communication problem during 5 seconds would be needed to disturb a
protocol of that takes 100 msec for the prover.

Regards,
 Sergio.



--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] network disruption as a service and proof of local storage

2015-03-27 Thread Robert McKay
Basically the problem with that is that someone could setup a single 
full node that has the blockchain and can answer those challenges and 
then a bunch of other non-full nodes that just proxy any such challenges 
to the single full node.

Rob

On 2015-03-26 23:04, Matt Whitlock wrote:
 Maybe I'm overlooking something, but I've been watching this thread
 with increasing skepticism at the complexity of the offered solution.
 I don't understand why it needs to be so complex. I'd like to offer 
 an
 alternative for your consideration...

 Challenge:
 Send me: SHA256(SHA256(concatenation of N pseudo-randomly selected
 bytes from the block chain)).

 Choose N such that it would be infeasible for the responding node to
 fetch all of the needed blocks in a short amount of time. In other
 words, assume that a node can seek to a given byte in a block stored
 on local disk much faster than it can download the entire block from 
 a
 remote peer. This is almost certainly a safe assumption.

 For example, choose N = 1024. Then the proving node needs to perform
 1024 random reads from local disk. On spinning media, this is likely
 to take somewhere on the order of 15 seconds. Assuming blocks are
 averaging 500 KiB each, then 1024 blocks would comprise 500 MiB of
 data. Can 500 MiB be downloaded in 15 seconds? This data transfer 
 rate
 is 280 Mbps. Almost certainly not possible. And if it is, just
 increase N. The challenge also becomes more difficult as average 
 block
 size increases.

 This challenge-response protocol relies on the lack of a partial
 getdata command in the Bitcoin protocol: a node cannot ask for only
 part of a block; it must ask for an entire block. Furthermore, nodes
 could ban other nodes for making too many random requests for blocks.


 On Thursday, 26 March 2015, at 7:09 pm, Sergio Lerner wrote:

  If I understand correctly, transforming raw blocks to keyed blocks
  takes 512x longer than transforming keyed blocks back to raw. The 
 key
  is public, like the IP, or some other value which perhaps changes 
 less
  frequently.
 
 Yes. I was thinking that the IP could be part of a first layer of
 encryption done to the blockchain data prior to the asymetric 
 operation.
 That way the asymmetric operation can be the same for all users (no
 different primers for different IPs, and then the verifiers does not
 have to verify that a particular p is actually a pseudo-prime 
 suitable
 for P.H. ) and the public exponent can be just 3.

 
  Two protocols can be performed to prove local possession:
  1. (prover and verifier pay a small cost) The verifier sends a 
 seed to
  derive some n random indexes, and the prover must respond with 
 the hash
  of the decrypted blocks within a certain time bound. Suppose that
  decryption of n blocks take 100 msec (+-100 msec of network 
 jitter).
  Then an attacker must have a computer 50 faster to be able to
  consistently cheat. The last 50 blocks should not be part of the 
 list to
  allow nodes to catch-up and encrypt the blocks in background.
 
 
  Can you clarify, the prover is hashing random blocks of 
 *decrypted*,
  as-in raw, blockchain data? What does this prove other than, 
 perhaps,
  fast random IO of the blockchain? (which is useful in its own 
 right,
  e.g. as a way to ensure only full-node IO-bound mining if baked 
 into
  the PoW)
 
  How is the verifier validating the response without possession of 
 the
  full blockchain?

 You're right, It is incorrect. Not the decrypted blocks must be 
 sent,
 but the encrypted blocks. There correct protocol is this:

 1. (prover and verifier pay a small cost) The verifier sends a seed 
 to
 derive some n random indexes, and the prover must respond with the 
 the
 encrypted blocks within a certain time bound. The verifier decrypts
 those blocks to check if they are part of the block-chain.

 But then there is this improvement which allows the verifier do 
 detect
 non full-nodes with much less computation:

 3. (prover pays a small cost, verifier smaller cost) The verifier 
 asks
 the prover to send a Merkle tree root of hashes of encrypted blocks 
 with
 N indexes selected by a psudo-random function seeded by a challenge
 value, where each encrypted-block is previously prefixed with the 
 seed
 before being hashed (e.g. N=100). The verifier receives the Markle 
 Root
 and performs a statistical test on the received information. From 
 the N
 hashes blocks, it chooses M  N (e.g. M = 20), and asks the proved 
 for
 the blocks at these indexes. The prover sends the blocks, the 
 verifier
 validates the blocks by decrypting them and also verifies that the
 Merkle tree was well constructed for those block nodes. This proves 
 with
 high probability that the Merkle tree was built on-the-fly and
 specifically for this challenge-response protocol.

  I also wonder about the effect of spinning disk versus SSD. Seek 
 time
  for 1,000 random reads is either nearly zero or dominating 
 depending
  on the two modes. I 

Re: [Bitcoin-development] network disruption as a service and proof of local storage

2015-03-27 Thread Matt Whitlock
I agree that someone could do this, but why is that a problem? Isn't the goal 
of this exercise to ensure more full nodes on the network? In order to be able 
to answer the challenges, an entity would need to be running a full node 
somewhere. Thus, they have contributed at least one additional full node to the 
network. I could certainly see a case for a company to host hundreds of 
lightweight (e.g., EC2) servers all backed by a single copy of the block chain. 
Why force every single machine to have its own copy? All you really need to 
require is that each agency/participant have its own copy.


On Friday, 27 March 2015, at 2:32 pm, Robert McKay wrote:
 Basically the problem with that is that someone could setup a single 
 full node that has the blockchain and can answer those challenges and 
 then a bunch of other non-full nodes that just proxy any such challenges 
 to the single full node.
 
 Rob
 
 On 2015-03-26 23:04, Matt Whitlock wrote:
  Maybe I'm overlooking something, but I've been watching this thread
  with increasing skepticism at the complexity of the offered solution.
  I don't understand why it needs to be so complex. I'd like to offer 
  an
  alternative for your consideration...
 
  Challenge:
  Send me: SHA256(SHA256(concatenation of N pseudo-randomly selected
  bytes from the block chain)).
 
  Choose N such that it would be infeasible for the responding node to
  fetch all of the needed blocks in a short amount of time. In other
  words, assume that a node can seek to a given byte in a block stored
  on local disk much faster than it can download the entire block from 
  a
  remote peer. This is almost certainly a safe assumption.
 
  For example, choose N = 1024. Then the proving node needs to perform
  1024 random reads from local disk. On spinning media, this is likely
  to take somewhere on the order of 15 seconds. Assuming blocks are
  averaging 500 KiB each, then 1024 blocks would comprise 500 MiB of
  data. Can 500 MiB be downloaded in 15 seconds? This data transfer 
  rate
  is 280 Mbps. Almost certainly not possible. And if it is, just
  increase N. The challenge also becomes more difficult as average 
  block
  size increases.
 
  This challenge-response protocol relies on the lack of a partial
  getdata command in the Bitcoin protocol: a node cannot ask for only
  part of a block; it must ask for an entire block. Furthermore, nodes
  could ban other nodes for making too many random requests for blocks.
 
 
  On Thursday, 26 March 2015, at 7:09 pm, Sergio Lerner wrote:
 
   If I understand correctly, transforming raw blocks to keyed blocks
   takes 512x longer than transforming keyed blocks back to raw. The 
  key
   is public, like the IP, or some other value which perhaps changes 
  less
   frequently.
  
  Yes. I was thinking that the IP could be part of a first layer of
  encryption done to the blockchain data prior to the asymetric 
  operation.
  That way the asymmetric operation can be the same for all users (no
  different primers for different IPs, and then the verifiers does not
  have to verify that a particular p is actually a pseudo-prime 
  suitable
  for P.H. ) and the public exponent can be just 3.
 
  
   Two protocols can be performed to prove local possession:
   1. (prover and verifier pay a small cost) The verifier sends a 
  seed to
   derive some n random indexes, and the prover must respond with 
  the hash
   of the decrypted blocks within a certain time bound. Suppose that
   decryption of n blocks take 100 msec (+-100 msec of network 
  jitter).
   Then an attacker must have a computer 50 faster to be able to
   consistently cheat. The last 50 blocks should not be part of the 
  list to
   allow nodes to catch-up and encrypt the blocks in background.
  
  
   Can you clarify, the prover is hashing random blocks of 
  *decrypted*,
   as-in raw, blockchain data? What does this prove other than, 
  perhaps,
   fast random IO of the blockchain? (which is useful in its own 
  right,
   e.g. as a way to ensure only full-node IO-bound mining if baked 
  into
   the PoW)
  
   How is the verifier validating the response without possession of 
  the
   full blockchain?
 
  You're right, It is incorrect. Not the decrypted blocks must be 
  sent,
  but the encrypted blocks. There correct protocol is this:
 
  1. (prover and verifier pay a small cost) The verifier sends a seed 
  to
  derive some n random indexes, and the prover must respond with the 
  the
  encrypted blocks within a certain time bound. The verifier decrypts
  those blocks to check if they are part of the block-chain.
 
  But then there is this improvement which allows the verifier do 
  detect
  non full-nodes with much less computation:
 
  3. (prover pays a small cost, verifier smaller cost) The verifier 
  asks
  the prover to send a Merkle tree root of hashes of encrypted blocks 
  with
  N indexes selected by a psudo-random function seeded by a challenge
  value, where each 

Re: [Bitcoin-development] network disruption as a service and proof of local storage

2015-03-27 Thread Robert McKay
The main motivation is to try and stop a single entity running lots of 
nodes in order to harvest transaction origin IPs. That's what's behind 
this.

Probably the efforts are a waste of time.. if someone has to keep a few 
hundred copies of the blockchain around in order to keep IP specific 
precomputed data around for all the IPs they listen on then they'll just 
buy a handful of 5TB HDs and call it a day.. still some of the ideas 
proposed are quite interesting and might not have much downside.

Rob


On 2015-03-27 15:16, Matt Whitlock wrote:
 I agree that someone could do this, but why is that a problem? Isn't
 the goal of this exercise to ensure more full nodes on the network? 
 In
 order to be able to answer the challenges, an entity would need to be
 running a full node somewhere. Thus, they have contributed at least
 one additional full node to the network. I could certainly see a case
 for a company to host hundreds of lightweight (e.g., EC2) servers all
 backed by a single copy of the block chain. Why force every single
 machine to have its own copy? All you really need to require is that
 each agency/participant have its own copy.


 On Friday, 27 March 2015, at 2:32 pm, Robert McKay wrote:
 Basically the problem with that is that someone could setup a single
 full node that has the blockchain and can answer those challenges 
 and
 then a bunch of other non-full nodes that just proxy any such 
 challenges
 to the single full node.

 Rob

 On 2015-03-26 23:04, Matt Whitlock wrote:
  Maybe I'm overlooking something, but I've been watching this 
 thread
  with increasing skepticism at the complexity of the offered 
 solution.
  I don't understand why it needs to be so complex. I'd like to 
 offer
  an
  alternative for your consideration...
 
  Challenge:
  Send me: SHA256(SHA256(concatenation of N pseudo-randomly 
 selected
  bytes from the block chain)).
 
  Choose N such that it would be infeasible for the responding node 
 to
  fetch all of the needed blocks in a short amount of time. In other
  words, assume that a node can seek to a given byte in a block 
 stored
  on local disk much faster than it can download the entire block 
 from
  a
  remote peer. This is almost certainly a safe assumption.
 
  For example, choose N = 1024. Then the proving node needs to 
 perform
  1024 random reads from local disk. On spinning media, this is 
 likely
  to take somewhere on the order of 15 seconds. Assuming blocks are
  averaging 500 KiB each, then 1024 blocks would comprise 500 MiB of
  data. Can 500 MiB be downloaded in 15 seconds? This data transfer
  rate
  is 280 Mbps. Almost certainly not possible. And if it is, just
  increase N. The challenge also becomes more difficult as average
  block
  size increases.
 
  This challenge-response protocol relies on the lack of a partial
  getdata command in the Bitcoin protocol: a node cannot ask for 
 only
  part of a block; it must ask for an entire block. Furthermore, 
 nodes
  could ban other nodes for making too many random requests for 
 blocks.
 
 
  On Thursday, 26 March 2015, at 7:09 pm, Sergio Lerner wrote:
 
   If I understand correctly, transforming raw blocks to keyed 
 blocks
   takes 512x longer than transforming keyed blocks back to raw. 
 The
  key
   is public, like the IP, or some other value which perhaps 
 changes
  less
   frequently.
  
  Yes. I was thinking that the IP could be part of a first layer of
  encryption done to the blockchain data prior to the asymetric
  operation.
  That way the asymmetric operation can be the same for all users 
 (no
  different primers for different IPs, and then the verifiers does 
 not
  have to verify that a particular p is actually a pseudo-prime
  suitable
  for P.H. ) and the public exponent can be just 3.
 
  
   Two protocols can be performed to prove local possession:
   1. (prover and verifier pay a small cost) The verifier sends a
  seed to
   derive some n random indexes, and the prover must respond with
  the hash
   of the decrypted blocks within a certain time bound. Suppose 
 that
   decryption of n blocks take 100 msec (+-100 msec of network
  jitter).
   Then an attacker must have a computer 50 faster to be able to
   consistently cheat. The last 50 blocks should not be part of 
 the
  list to
   allow nodes to catch-up and encrypt the blocks in background.
  
  
   Can you clarify, the prover is hashing random blocks of
  *decrypted*,
   as-in raw, blockchain data? What does this prove other than,
  perhaps,
   fast random IO of the blockchain? (which is useful in its own
  right,
   e.g. as a way to ensure only full-node IO-bound mining if baked
  into
   the PoW)
  
   How is the verifier validating the response without possession 
 of
  the
   full blockchain?
 
  You're right, It is incorrect. Not the decrypted blocks must be
  sent,
  but the encrypted blocks. There correct protocol is this:
 
  1. (prover and verifier pay a small cost) The verifier sends a 
 seed
  to
  

Re: [Bitcoin-development] network disruption as a service and proof of local storage

2015-03-27 Thread Matt Whitlock
On Friday, 27 March 2015, at 4:57 pm, Wladimir J. van der Laan wrote:
 On Fri, Mar 27, 2015 at 11:16:43AM -0400, Matt Whitlock wrote:
  I agree that someone could do this, but why is that a problem? Isn't the 
  goal of this exercise to ensure more full nodes on the network? In order to 
  be able to answer the challenges, an entity would need to be running a full 
  node somewhere. Thus, they have contributed at least one additional full 
  node to the network. I could certainly see a case for a company to host 
  hundreds of lightweight (e.g., EC2) servers all backed by a single copy of 
  the block chain. Why force every single machine to have its own copy? All 
  you really need to require is that each agency/participant have its own 
  copy.
 
 They would not even have to run one. It could just pass the query to a random 
 other node, and forward its result :)

D'oh. Of course. Thanks. :/

The suggestion about encrypting blocks with a key tied to IP address seems like 
a bad idea, though. Lots of nodes are on dynamic IP addresses. It wouldn't 
really be practical to re-encrypt the entire block chain every time a node's IP 
address changes.

--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] network disruption as a service and proof of local storage

2015-03-27 Thread Matt Whitlock
On Friday, 27 March 2015, at 4:57 pm, Wladimir J. van der Laan wrote:
 On Fri, Mar 27, 2015 at 11:16:43AM -0400, Matt Whitlock wrote:
  I agree that someone could do this, but why is that a problem? Isn't the 
  goal of this exercise to ensure more full nodes on the network? In order to 
  be able to answer the challenges, an entity would need to be running a full 
  node somewhere. Thus, they have contributed at least one additional full 
  node to the network. I could certainly see a case for a company to host 
  hundreds of lightweight (e.g., EC2) servers all backed by a single copy of 
  the block chain. Why force every single machine to have its own copy? All 
  you really need to require is that each agency/participant have its own 
  copy.
 
 They would not even have to run one. It could just pass the query to a random 
 other node, and forward its result :)

Ah, easy way to fix that. In fact, in my first draft of my suggestion, I had 
the answer, but I removed it because I thought it was superfluous.

Challenge:
Send me: SHA256(SHA256(concatenation of N pseudo-randomly selected bytes from 
the block chain | prover's nonce | verifier's nonce)).

The nonces are from the version messages exchanged at connection startup. A 
node can't pass the buck because it can't control the nonce that a random other 
node chooses.

--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] network disruption as a service and proof of local storage

2015-03-27 Thread Jeremy Spilman

 On Mar 27, 2015, at 8:16 AM, Matt Whitlock b...@mattwhitlock.name wrote:
 
 Isn't the goal of this exercise to ensure more full nodes on the network?

Basically we're talking about a form of Sybil defense and better quantifying 
true blockchain resiliency by proof of storage.

In this case the goal is to see if we can prove the number of distinct digital 
copies of the blockchain. This is actually a tricky problem because it will 
(always?) devolve to inferences from response timing, and we are running over a 
heterogenous network with heterogeneous machines.

It would be extremely impressive to achieve a reliable mechanism for discerning 
a local copy exists under these constraints, particularly without false 
positives and false negatives, and without imposing very substantial one-time 
encoding costs, e.g. on par with doubling the verification cost. 

I think while its a difficult cost-benefit analysis, even code complexity 
aside, it's interesting to discuss all the same!

Simply having many unique IP addresses possibly accessing the same unique copy 
provides a different (if any) benefit. E.g. Tor uses IPs as a cost factor, but 
(until recently?) didn't even factor in things like them all being the same 
Class C. 

--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] network disruption as a service and proof of local storage

2015-03-27 Thread Thy Shizzle
If the IP discovery is your main motivation, why don't you introduce some onion 
routing into transactions? That would solve this problem easily, of course 
there is an overhead which will slightly slow down the relay of transactions 
but not significantly, also make it an option not enforced, for those worried 
about IP association.

From: Robert McKaymailto:rob...@mckay.com
Sent: ‎28/‎03/‎2015 2:33 AM
To: Matt Whitlockmailto:b...@mattwhitlock.name
Cc: 
bitcoin-development@lists.sourceforge.netmailto:bitcoin-development@lists.sourceforge.net
Subject: Re: [Bitcoin-development] network disruption as a service and proof 
of local storage

The main motivation is to try and stop a single entity running lots of
nodes in order to harvest transaction origin IPs. That's what's behind
this.

Probably the efforts are a waste of time.. if someone has to keep a few
hundred copies of the blockchain around in order to keep IP specific
precomputed data around for all the IPs they listen on then they'll just
buy a handful of 5TB HDs and call it a day.. still some of the ideas
proposed are quite interesting and might not have much downside.

Rob


On 2015-03-27 15:16, Matt Whitlock wrote:
 I agree that someone could do this, but why is that a problem? Isn't
 the goal of this exercise to ensure more full nodes on the network?
 In
 order to be able to answer the challenges, an entity would need to be
 running a full node somewhere. Thus, they have contributed at least
 one additional full node to the network. I could certainly see a case
 for a company to host hundreds of lightweight (e.g., EC2) servers all
 backed by a single copy of the block chain. Why force every single
 machine to have its own copy? All you really need to require is that
 each agency/participant have its own copy.


 On Friday, 27 March 2015, at 2:32 pm, Robert McKay wrote:
 Basically the problem with that is that someone could setup a single
 full node that has the blockchain and can answer those challenges
 and
 then a bunch of other non-full nodes that just proxy any such
 challenges
 to the single full node.

 Rob

 On 2015-03-26 23:04, Matt Whitlock wrote:
  Maybe I'm overlooking something, but I've been watching this
 thread
  with increasing skepticism at the complexity of the offered
 solution.
  I don't understand why it needs to be so complex. I'd like to
 offer
  an
  alternative for your consideration...
 
  Challenge:
  Send me: SHA256(SHA256(concatenation of N pseudo-randomly
 selected
  bytes from the block chain)).
 
  Choose N such that it would be infeasible for the responding node
 to
  fetch all of the needed blocks in a short amount of time. In other
  words, assume that a node can seek to a given byte in a block
 stored
  on local disk much faster than it can download the entire block
 from
  a
  remote peer. This is almost certainly a safe assumption.
 
  For example, choose N = 1024. Then the proving node needs to
 perform
  1024 random reads from local disk. On spinning media, this is
 likely
  to take somewhere on the order of 15 seconds. Assuming blocks are
  averaging 500 KiB each, then 1024 blocks would comprise 500 MiB of
  data. Can 500 MiB be downloaded in 15 seconds? This data transfer
  rate
  is 280 Mbps. Almost certainly not possible. And if it is, just
  increase N. The challenge also becomes more difficult as average
  block
  size increases.
 
  This challenge-response protocol relies on the lack of a partial
  getdata command in the Bitcoin protocol: a node cannot ask for
 only
  part of a block; it must ask for an entire block. Furthermore,
 nodes
  could ban other nodes for making too many random requests for
 blocks.
 
 
  On Thursday, 26 March 2015, at 7:09 pm, Sergio Lerner wrote:
 
   If I understand correctly, transforming raw blocks to keyed
 blocks
   takes 512x longer than transforming keyed blocks back to raw.
 The
  key
   is public, like the IP, or some other value which perhaps
 changes
  less
   frequently.
  
  Yes. I was thinking that the IP could be part of a first layer of
  encryption done to the blockchain data prior to the asymetric
  operation.
  That way the asymmetric operation can be the same for all users
 (no
  different primers for different IPs, and then the verifiers does
 not
  have to verify that a particular p is actually a pseudo-prime
  suitable
  for P.H. ) and the public exponent can be just 3.
 
  
   Two protocols can be performed to prove local possession:
   1. (prover and verifier pay a small cost) The verifier sends a
  seed to
   derive some n random indexes, and the prover must respond with
  the hash
   of the decrypted blocks within a certain time bound. Suppose
 that
   decryption of n blocks take 100 msec (+-100 msec of network
  jitter).
   Then an attacker must have a computer 50 faster to be able to
   consistently cheat. The last 50 blocks should not be part of
 the
  list to
   allow nodes to catch-up and encrypt the blocks in 

Re: [Bitcoin-development] network disruption as a service and proof of local storage

2015-03-26 Thread Sergio Lerner

 If I understand correctly, transforming raw blocks to keyed blocks
 takes 512x longer than transforming keyed blocks back to raw. The key
 is public, like the IP, or some other value which perhaps changes less
 frequently.

Yes. I was thinking that the IP could be part of a first layer of
encryption done to the blockchain data prior to the asymetric operation.
That way the asymmetric operation can be the same for all users (no
different primers for different IPs, and then the verifiers does not
have to verify that a particular p is actually a pseudo-prime suitable
for P.H. ) and the public exponent can be just 3.


 Two protocols can be performed to prove local possession:
 1. (prover and verifier pay a small cost) The verifier sends a seed to
 derive some n random indexes, and the prover must respond with the hash
 of the decrypted blocks within a certain time bound. Suppose that
 decryption of n blocks take 100 msec (+-100 msec of network jitter).
 Then an attacker must have a computer 50 faster to be able to
 consistently cheat. The last 50 blocks should not be part of the list to
 allow nodes to catch-up and encrypt the blocks in background.


 Can you clarify, the prover is hashing random blocks of *decrypted*,
 as-in raw, blockchain data? What does this prove other than, perhaps,
 fast random IO of the blockchain? (which is useful in its own right,
 e.g. as a way to ensure only full-node IO-bound mining if baked into
 the PoW)

 How is the verifier validating the response without possession of the
 full blockchain?

You're right, It is incorrect. Not the decrypted blocks must be sent,
but the encrypted blocks. There correct protocol is this:

1. (prover and verifier pay a small cost) The verifier sends a seed to
derive some n random indexes, and the prover must respond with the the
encrypted blocks within a certain time bound. The verifier decrypts
those blocks to check if they are part of the block-chain.

But then there is this improvement which allows the verifier do detect
non full-nodes with much less computation:

3. (prover pays a small cost, verifier smaller cost) The verifier asks
the prover to send a Merkle tree root of hashes of encrypted blocks with
N indexes selected by a psudo-random function seeded by a challenge
value, where each encrypted-block is previously prefixed with the seed
before being hashed (e.g. N=100). The verifier receives the Markle Root
and performs a statistical test on the received information. From the N
hashes blocks, it chooses M  N (e.g. M = 20), and asks the proved for
the blocks at these indexes. The prover sends the blocks, the verifier
validates the blocks by decrypting them and also verifies that the
Merkle tree was well constructed for those block nodes. This proves with
high probability that the Merkle tree was built on-the-fly and
specifically for this challenge-response protocol.

 I also wonder about the effect of spinning disk versus SSD. Seek time
 for 1,000 random reads is either nearly zero or dominating depending
 on the two modes. I wonder if a sequential read from a random index is
 a possible trade-off,; it doesn't prove possession of the whole chain
 nearly as well, but at least iowait converges significantly. Then
 again, that presupposes a specific ordering on disk which might not
 exist. In X years it will all be solid-state, so eventually it's moot.

Good idea.

Also we don't need that every node implements the protocol, but only
nodes that want to prove full-node-ness, such as the ones which want to
receive bitnodes subsidy.



--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] network disruption as a service and proof of local storage

2015-03-26 Thread Matt Whitlock
Maybe I'm overlooking something, but I've been watching this thread with 
increasing skepticism at the complexity of the offered solution. I don't 
understand why it needs to be so complex. I'd like to offer an alternative for 
your consideration...

Challenge:
Send me: SHA256(SHA256(concatenation of N pseudo-randomly selected bytes from 
the block chain)).

Choose N such that it would be infeasible for the responding node to fetch all 
of the needed blocks in a short amount of time. In other words, assume that a 
node can seek to a given byte in a block stored on local disk much faster than 
it can download the entire block from a remote peer. This is almost certainly a 
safe assumption.

For example, choose N = 1024. Then the proving node needs to perform 1024 
random reads from local disk. On spinning media, this is likely to take 
somewhere on the order of 15 seconds. Assuming blocks are averaging 500 KiB 
each, then 1024 blocks would comprise 500 MiB of data. Can 500 MiB be 
downloaded in 15 seconds? This data transfer rate is 280 Mbps. Almost certainly 
not possible. And if it is, just increase N. The challenge also becomes more 
difficult as average block size increases.

This challenge-response protocol relies on the lack of a partial getdata 
command in the Bitcoin protocol: a node cannot ask for only part of a block; it 
must ask for an entire block. Furthermore, nodes could ban other nodes for 
making too many random requests for blocks.


On Thursday, 26 March 2015, at 7:09 pm, Sergio Lerner wrote:
 
  If I understand correctly, transforming raw blocks to keyed blocks
  takes 512x longer than transforming keyed blocks back to raw. The key
  is public, like the IP, or some other value which perhaps changes less
  frequently.
 
 Yes. I was thinking that the IP could be part of a first layer of
 encryption done to the blockchain data prior to the asymetric operation.
 That way the asymmetric operation can be the same for all users (no
 different primers for different IPs, and then the verifiers does not
 have to verify that a particular p is actually a pseudo-prime suitable
 for P.H. ) and the public exponent can be just 3.
 
 
  Two protocols can be performed to prove local possession:
  1. (prover and verifier pay a small cost) The verifier sends a seed to
  derive some n random indexes, and the prover must respond with the hash
  of the decrypted blocks within a certain time bound. Suppose that
  decryption of n blocks take 100 msec (+-100 msec of network jitter).
  Then an attacker must have a computer 50 faster to be able to
  consistently cheat. The last 50 blocks should not be part of the list to
  allow nodes to catch-up and encrypt the blocks in background.
 
 
  Can you clarify, the prover is hashing random blocks of *decrypted*,
  as-in raw, blockchain data? What does this prove other than, perhaps,
  fast random IO of the blockchain? (which is useful in its own right,
  e.g. as a way to ensure only full-node IO-bound mining if baked into
  the PoW)
 
  How is the verifier validating the response without possession of the
  full blockchain?
 
 You're right, It is incorrect. Not the decrypted blocks must be sent,
 but the encrypted blocks. There correct protocol is this:
 
 1. (prover and verifier pay a small cost) The verifier sends a seed to
 derive some n random indexes, and the prover must respond with the the
 encrypted blocks within a certain time bound. The verifier decrypts
 those blocks to check if they are part of the block-chain.
 
 But then there is this improvement which allows the verifier do detect
 non full-nodes with much less computation:
 
 3. (prover pays a small cost, verifier smaller cost) The verifier asks
 the prover to send a Merkle tree root of hashes of encrypted blocks with
 N indexes selected by a psudo-random function seeded by a challenge
 value, where each encrypted-block is previously prefixed with the seed
 before being hashed (e.g. N=100). The verifier receives the Markle Root
 and performs a statistical test on the received information. From the N
 hashes blocks, it chooses M  N (e.g. M = 20), and asks the proved for
 the blocks at these indexes. The prover sends the blocks, the verifier
 validates the blocks by decrypting them and also verifies that the
 Merkle tree was well constructed for those block nodes. This proves with
 high probability that the Merkle tree was built on-the-fly and
 specifically for this challenge-response protocol.
 
  I also wonder about the effect of spinning disk versus SSD. Seek time
  for 1,000 random reads is either nearly zero or dominating depending
  on the two modes. I wonder if a sequential read from a random index is
  a possible trade-off,; it doesn't prove possession of the whole chain
  nearly as well, but at least iowait converges significantly. Then
  again, that presupposes a specific ordering on disk which might not
  exist. In X years it will all be solid-state, so eventually it's moot.
 
 Good idea.
 
 

Re: [Bitcoin-development] network disruption as a service and proof of local storage

2015-03-24 Thread Jeremy Spilman
On Mon, 16 Mar 2015 09:29:03 -0700, Sergio Lerner  
sergioler...@certimix.com wrote:
 I proposed a (what I think) is better protocol for Proof of Storage that
 I call Proof of Local storage here
 https://bitslog.wordpress.com/2014/11/03/proof-of-local-blockchain-storage/

Thanks so much for publishing this. It could be useful in any application  
to try to prove a keyed copy of some data.

If I understand correctly, transforming raw blocks to keyed blocks takes  
512x longer than transforming keyed blocks back to raw. The key is public,  
like the IP, or some other value which perhaps changes less frequently.

The verifier keeps blocks in the keyed format, and can decrypt quickly to  
provide raw data, or use the keyed data for hashing to try to demonstrate  
they have a pre-keyed copy.


 Two protocols can be performed to prove local possession:
 1. (prover and verifier pay a small cost) The verifier sends a seed to
 derive some n random indexes, and the prover must respond with the hash
 of the decrypted blocks within a certain time bound. Suppose that
 decryption of n blocks take 100 msec (+-100 msec of network jitter).
 Then an attacker must have a computer 50 faster to be able to
 consistently cheat. The last 50 blocks should not be part of the list to
 allow nodes to catch-up and encrypt the blocks in background.


Can you clarify, the prover is hashing random blocks of *decrypted*, as-in  
raw, blockchain data? What does this prove other than, perhaps, fast  
random IO of the blockchain? (which is useful in its own right, e.g. as a  
way to ensure only full-node IO-bound mining if baked into the PoW)

How is the verifier validating the response without possession of the full  
blockchain?

 2. (prover pay a high cost, verified pays negligible cost). The verifier
 chooses a seed n, and then pre-computes the encrypted blocks derived
 from the seed using the prover's IP. Then the verifier sends the  seed,
 and the prover must respond with the hash of the encrypted blocks within
 a certain time bound. The proved does not require to do any PH
 decryption, just take the encrypted blocks for indexes derived from the
 seed, hash them and send the hash back to the verifier. The verifier
 validates the time bound and the hash.

The challenger requests a hash-sum of a random sequence of indices of the  
keyed data, based on a challenge seed. So in a few bytes round-trip we can  
see how fast the computation is completed. If the data is already keyed,  
the hash of 1,000 random 1024-bit blocks should come back much faster than  
if the data needs to be keyed on-the-fly.

To verify the response, the challenger would have to use the peer's  
identity key and perform the slower transforms on those same 1,000 blocks  
and see that the result matches, so cost to challenger is higher than  
prover, assuming they actually do the computation.

Which brings up a good tweak, a full-node challenger could have to do the  
computation first, then also include something like HMAC(identityKey,  
expectedResult). The prover could then know if the challenger was honest  
before returning a result, and blacklist them if not.


 Both protocols can me made available by the client, under different
 states. For instance, new nodes are only allowed to request protocol 2
 (and so they get an initial assurance their are connecting to
 full-nodes). After a first-time mutual authentication, they are allowed
 to periodically perform protocol 1. Also new nodes may be allowed to
 perform protocol 1 with a small index set, and increase the index set
 over time, to get higher confidence.

I guess a new-node could see if different servers all returned the same  
challenge response, but they would have no way to know if the challenge  
response was technically correct, or sybil.

I also wonder about the effect of spinning disk versus SSD. Seek time for  
1,000 random reads is either nearly zero or dominating depending on the  
two modes. I wonder if a sequential read from a random index is a possible  
trade-off,; it doesn't prove possession of the whole chain nearly as well,  
but at least iowait converges significantly. Then again, that presupposes  
a specific ordering on disk which might not exist. In X years it will all  
be solid-state, so eventually it's moot.


--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development