Re: [Bitcoin-development] LevelDB benchmarking

2012-06-19 Thread Stefan Thomas
Here are my 2 cents after using LevelDB as the default backend for
BitcoinJS for about a year.

LevelDB was written to power IndexedDB in Chrome which is a JavaScript
API. That means that LevelDB doesn't really give you a lot of options,
because they assume that on the C++ layer you don't know any more than
they do, because the actual application is on the JavaScript layer. For
example whereas BDB supports hashtables, b-trees, queues, etc., LevelDB
uses one database type, LSM trees which is an ordered data structure
that is pretty good at everything.

Another gotcha was the number of file descriptors, LevelDB defaults to
1000 per DB. We originally used multiple DBs, one for each of the
indices, but it was easy enough to combine everything into one table,
thereby solving the fd issue. (Lowering the file descriptor limit also
works of course, but if you lower it too much, LevelDB will start to
spend a lot of time opening and closing files, so I believe combining
your tables into one is the better option.)

Overall, LevelDB is a fantastic solution for desktop software that is
faced with multiple use cases that aren't known at compile time. It
isn't really designed for something like Bitcoin which doesn't need
ordered access, has relatively predictable characteristics and - at
least some of the time - runs on servers.

That said, it does seem to work well for the Bitcoin use case anyway.
Thanks to the LSM trees, It's very quick at doing bulk inserts and we
don't seem to need any of the bells and whistles that BDB offers. So I
can't think of a reason not to switch, just make sure you all understand
the deal, LevelDB unlike Tokyo/Kyoto Cabinet is *not* intended as a
competitor or replacement for BDB, it's something quite different.



On 6/19/2012 6:06 PM, Mike Hearn wrote:
>> What problem does it solve?
> Primarily that block verification and therefore propagation is too
> slow because it's very CPU and IO intensive. The CPU work can be
> multi-threaded. The IO work, not as much. As Bitcoin grows we need to
> scale the nodes. Eventually there may be multi-machine nodes, but for
> now we can buy more time by making the existing nodes faster.
>
> I don't see this as a replacement for moving users to SPV clients.
> Obviously, otherwise I would not be writing one ;)
>
>> If the problem it will solve is the "too easy to get a DB_RUNRECOVERY
>> error" because bdb is fragile when it comes to its environment... then
>> LevelDB looks very interesting.
> I have no experience with how robust LevelDB is. It has an API call to
> try and repair the database and I know from experience that BigTable
> is pretty solid. But that doesn't mean LevelDB is.
>
>> If the problem is bdb is creaky and old and has obscure semantics and
>> a hard-to-work-with API, then yes, lets switch (I'm easily seduced by
>> a pretty API and blazing fast performance).
> The code is a lot simpler for sure.
>
>> As long as it compiles and runs on mac/windows/linux that doesn't
>> really worry me.
> It was refactored out of BigTable and made standalone for usage in
> Chrome. Therefore it's as portable as Chrome is. Mac/Windows/Linux
> should all work. Solaris, I believe, may need 64 bit binaries to avoid
> low FD limits.
>
>> Lack of infrastructure because it is new does worry me; for example,
>> could I rework bitcointools to read the LevelDB blockchain?  (are
>> there python bindings for LevelDB?)
> Yes: http://code.google.com/p/py-leveldb/
>
> First look at the code is here, but it's not ready for a pull req yet,
> and I'll force push over it a few times to get it into shape. So don't
> branch:
>
> https://github.com/mikehearn/bitcoin/commit/2b601dd4a0093f834084241735d84d84e484f183
>
> It has misc other changes I made whilst profiling, isn't well
> commented enough, etc.
>
> --
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and 
> threat landscape has changed and how IT managers can respond. Discussions 
> will include endpoint security, mobile security and the latest in malware 
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> ___
> Bitcoin-development mailing list
> Bitcoin-development@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development



--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] New P2P commands for diagnostics, SPV clients

2012-06-19 Thread Matt Corallo
On Sat, 2012-06-16 at 10:27 +0200, Mike Hearn wrote:
> > I'd much rather have an overloaded node respond with 50% fp rate filters
> > as an option if there aren't many full nodes available than simply
> > disconnect SPV clients.
> 
> I don't think the bloom filter settings have any impact on server-side
> load ... a node still has to check every transaction against the
> filter regardless of how that filter is configured, which means the
> same amount of disk io and processing.
> 
> How can you reduce load on a peer by negotiating different filter settings?
Agreed, I was largely giving a reason why one may want to negotiate the
filter settings in response to your question as to why it was done.  As
long as there are sane limits (you cant make a 1GB filter by specifying
0% fp and some crazy number of entires), filter negotiation largely isnt
worth it (also prevents any floats from appearing in the p2p protocol,
though in either case it shouldn't be able to cause issues).

Matt


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Ultimate Blockchain Compression w/ trust-free lite node

2012-06-19 Thread Alan Reiner

On 06/19/2012 02:18 PM, Mark Friedenbach wrote:
On Tue, Jun 19, 2012 at 10:33 AM, Alan Reiner > wrote:


If we were to use a raw trie structure, then we'd have all the above
issues solved:  a trie has the same configuration no matter how
elements
are inserted or deleted, and accesses to elements in the tree are
constant time -- O(1).  There is no such thing as an unbalanced trie.
But overall space-efficiency is an issue.

A PATRICIA tree/trie would be ideal, in my mind, as it also has a
completely deterministic structure, and is an order-of-magnitude more
space-efficient.  Insert, delete and query times are still O(1).
However, it is not a trivial implementation.  I have occasionally
looked
for implementations, but not found any that were satisfactory.


No, a trie of any sort is dependent upon distribution of input data 
for balancing. As Peter Todd points out, a malicious actor could 
construct transaction or address hashes in such a way as to grow some 
segment of the trie in an unbalanced fashion. It's not much of an 
attack, but in principle exploitable under particular timing-sensitive 
circumstances.


Self-balancing search trees (KVL, RB, 2-3-4, whatever) don't suffer 
from this problem.


Mark


I was using "unbalanced" to refer to "query time" (and also 
insert/delete time).  If your trie nodes branch based on the next byte 
of your key hash, then the max depth of your trie is 32.  Period.  No 
one can do anything to ever make you do more than 32 hops to 
find/insert/delete your data.   And if you're using a raw trie, you'll 
always use /exactly/ 32 hops regardless of the distribution of the 
underlying data.  Hence, the trie structure is deterministic 
(history-independent) and cannot become unbalanced in terms of access time.


My first concern was that a malicious actor could linearize parts of the 
tree and cause access requests to take much longer than log(N) time.  
With the trie, that's not only impossible, you're actually accessing in 
O(1) time.


However, you are right that disk space can be affected by a malicious 
actor.  The more branching he can induce, the more branch nodes that are 
created to support branches with only one leaf.



--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Ultimate Blockchain Compression w/ trust-free lite node

2012-06-19 Thread Andrew Miller
Alan Reiner wrote:
> A PATRICIA tree/trie would be ideal, in my mind, as it also has a
> completely deterministic structure, and is an order-of-magnitude more
> space-efficient.  Insert, delete and query times are still O(1).
> However, it is not a trivial implementation.  I have occasionally looked
> for implementations, but not found any that were satisfactory.

PATRICIA Tries (aka Radix trees) have worst-case O(k), where k is the
number of bits in the key. Notice that since we would storing k-bit
hashes, the number of elements must be less than 2^k, or else by
birthday paradox we would have a hash collision! So O(log N) <= O(k).

You're right, though, that such a trie would have the property that
any two trees containing the same data (leaves) will be identical. I
can't think of any reason why this is useful, although I am hoping we
can figure out what is triggering your intuition to desire this! I am
indeed assuming that the tree will be incrementally constructed
according to the canonical (blockchain) ordering of transactions, and
that the balancing rules are agreed on as part of the protocol.

-- 
Andrew Miller

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Ultimate Blockchain Compression w/ trust-free lite node

2012-06-19 Thread Mark Friedenbach
On Tue, Jun 19, 2012 at 10:33 AM, Alan Reiner  wrote:

> I hope that someone else here would chime in on the issue raised in the
> thread, about using a tree-structure that has multiple valid
> configurations for the same set of unspent-TxOuts.  If you use any
> binary tree, you must replay the entire history of insertions and
> deletions in the correct order to get the tree structure and correct
> root.  Along those lines, using something like a red-black tree, while
> theoretically well-known, could be subject to implementation errors.
> One implementation of a red-black tree may do the rebalancing
> differently, and still work for it's intended purpose in the majority of
> applications where it doesn't matter.  One app developer updates their
> RB tree code which updated the RB-tree optimizations/rebalancing, and
> now a significant portion of the network can't agree on the correct
> root.  Not only would that be disruptive, it would be a disaster to
> track down.
>

Then use a 2-3-4 tree (aka self-balancing B-tree of order 4), which is a
generalization of RB-trees that doesn't allow for implementation choices in
balancing (assuming ordered insertion and deletion).

As gmaxwell points out, this is an trivially fixable 'problem'. Choose a
standard, mandate it, and write test cases.

If we were to use a raw trie structure, then we'd have all the above
> issues solved:  a trie has the same configuration no matter how elements
> are inserted or deleted, and accesses to elements in the tree are
> constant time -- O(1).  There is no such thing as an unbalanced trie.
> But overall space-efficiency is an issue.
>
> A PATRICIA tree/trie would be ideal, in my mind, as it also has a
> completely deterministic structure, and is an order-of-magnitude more
> space-efficient.  Insert, delete and query times are still O(1).
> However, it is not a trivial implementation.  I have occasionally looked
> for implementations, but not found any that were satisfactory.
>

No, a trie of any sort is dependent upon distribution of input data for
balancing. As Peter Todd points out, a malicious actor could construct
transaction or address hashes in such a way as to grow some segment of the
trie in an unbalanced fashion. It's not much of an attack, but in principle
exploitable under particular timing-sensitive circumstances.

Self-balancing search trees (KVL, RB, 2-3-4, whatever) don't suffer from
this problem.

Mark
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Ultimate Blockchain Compression w/ trust-free lite node

2012-06-19 Thread Alan Reiner

On 06/19/2012 01:59 PM, Gregory Maxwell wrote:

On Tue, Jun 19, 2012 at 1:33 PM, Alan Reiner  wrote:

  One app developer updates their
RB tree code which updated the RB-tree optimizations/rebalancing, and
now a significant portion of the network can't agree on the correct
root.  Not only would that be disruptive, it would be a disaster to
track down.

This is why good comprehensive tests and a well specified algorithim
are important. The tree update algorithm would be normative in that
scheme. Worrying that implementers might get it wrong would be like
worrying that they'd get SHA256 wrong.


The point is not that they get it *wrong*, it's that the implement it 
*differently*.  Given a set of 100 TxOuts, there's a seemingly-infinite 
number of ways to construct a binary tree.  Put them in in a different 
order, and you get a different tree. *They're all correct and legal* in 
terms of satisfying expectations of insert, delete and query runtime -- 
but they will produce different root hashes.   And the differences in 
underlying structure are completely transparent to the calling code.


I'm extremely uncomfortable with the idea the you can have all the nodes 
in the tree, but have to replay X years of blockchain history just to 
get the same tree configuration as someone else.  However, a trie 
configuration is history-independent -- given an unspent-TxOut list, 
there's only one way to construct that tree.  That's an important 
property to me.


I can't tell if you're joking about Judy structures: I've never heard of 
them.  But I'll look into it anyway...


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Ultimate Blockchain Compression w/ trust-free lite node

2012-06-19 Thread Gregory Maxwell
On Tue, Jun 19, 2012 at 1:33 PM, Alan Reiner  wrote:
> One app developer updates their
> RB tree code which updated the RB-tree optimizations/rebalancing, and
> now a significant portion of the network can't agree on the correct
> root.  Not only would that be disruptive, it would be a disaster to
> track down.

This is why good comprehensive tests and a well specified algorithim
are important. The tree update algorithm would be normative in that
scheme. Worrying that implementers might get it wrong would be like
worrying that they'd get SHA256 wrong.

> A PATRICIA tree/trie would be ideal, in my mind, as it also has a
> completely deterministic structure, and is an order-of-magnitude more

Provable libJudy trees. Oh boy.

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Ultimate Blockchain Compression w/ trust-free lite node

2012-06-19 Thread Alan Reiner

I hope that someone else here would chime in on the issue raised in the 
thread, about using a tree-structure that has multiple valid 
configurations for the same set of unspent-TxOuts.  If you use any 
binary tree, you must replay the entire history of insertions and 
deletions in the correct order to get the tree structure and correct 
root.  Along those lines, using something like a red-black tree, while 
theoretically well-known, could be subject to implementation errors.  
One implementation of a red-black tree may do the rebalancing 
differently, and still work for it's intended purpose in the majority of 
applications where it doesn't matter.  One app developer updates their 
RB tree code which updated the RB-tree optimizations/rebalancing, and 
now a significant portion of the network can't agree on the correct 
root.  Not only would that be disruptive, it would be a disaster to 
track down.

If we were to use a raw trie structure, then we'd have all the above 
issues solved:  a trie has the same configuration no matter how elements 
are inserted or deleted, and accesses to elements in the tree are 
constant time -- O(1).  There is no such thing as an unbalanced trie.  
But overall space-efficiency is an issue.

A PATRICIA tree/trie would be ideal, in my mind, as it also has a 
completely deterministic structure, and is an order-of-magnitude more 
space-efficient.  Insert, delete and query times are still O(1).
However, it is not a trivial implementation.  I have occasionally looked 
for implementations, but not found any that were satisfactory.

So, I don't have a good all-around solution, within my own stated 
constraints. But perhaps I'm being too demanding of this solution.

-Alan



On 06/19/2012 12:46 PM, Andrew Miller wrote:
>> Peter Todd wrote:
>> My solution was to simply state that vertexes that happened to cause the
>> tree to be unbalanced would be discarded, and set the depth of inbalance
>> such that this would be extremely unlikely to happen by accident. I'd
>> rather see someone come up with something better though.
> Here is a simpler solution. (most of this message repeats the content
> of my reply to the forum)
>
> Suppose we were talking about a binary search tree, rather than a
> Merkle tree. It's important to balance a binary search tree, so that
> the worst-case maximum length from the root to a leaf is bounded by
> O(log N). AVL trees were the original algorithm to do this, Red-Black
> trees are also popular, and there are many similar methods. All
> involve storing some form of 'balancing metadata' at each node. In a
> RedBlack tree, this is a single bit (red or black). Every operation on
> these trees, including search, inserting, deleting, and rebalancing,
> requires a worst-case effort of O(log N).
>
> Any (acyclic) recursive data structure can be Merkle-ized, simply by
> adding a hash of the child node alongside each link/pointer. This way,
> you can verify the data for each node very naturally, as you traverse
> the structure.
>
> In fact, as long as a lite-client knows the O(1) root hash, the rest
> of the storage burden can be delegated to an untrusted helper server.
> Suppose a lite-client wants to insert and rebalance its tree. This
> requires accessing at most O(log N) nodes. The client can request only
> the data relevant to these nodes, and it knows the hash for each chunk
> of data in advance of accessing it. After computing the updated root
> hash, the client can even discard the data it processed.
>
> This technique has been well discussed in the academic literature,
> e.g. [1,2], although since I am not aware of any existing
> implementation, I made my own, intended as an explanatory aid:
> https://github.com/amiller/redblackmerkle/blob/master/redblack.py
>
>
> [1] Certificate Revocation and Update
>  Naor and Nissim. 1998
>  
> http://static.usenix.org/publications/library/proceedings/sec98/full_papers/nissim/nissim.pdf
>
> [2] A General Model for Authenticated Data Structures
>  Martel, Nuckolls, Devanbu, Michael Gertz, Kwong, Stubblebine. 2004
>  http://truthsayer.cs.ucdavis.edu/algorithmica.pdf
>
> --
> Andrew Miller
>
> --
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> ___
> Bitcoin-development mailing list
> Bitcoin-development@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers

Re: [Bitcoin-development] Ultimate Blockchain Compression w/ trust-free lite node

2012-06-19 Thread Andrew Miller
> Peter Todd wrote:
> My solution was to simply state that vertexes that happened to cause the
> tree to be unbalanced would be discarded, and set the depth of inbalance
> such that this would be extremely unlikely to happen by accident. I'd
> rather see someone come up with something better though.

Here is a simpler solution. (most of this message repeats the content
of my reply to the forum)

Suppose we were talking about a binary search tree, rather than a
Merkle tree. It's important to balance a binary search tree, so that
the worst-case maximum length from the root to a leaf is bounded by
O(log N). AVL trees were the original algorithm to do this, Red-Black
trees are also popular, and there are many similar methods. All
involve storing some form of 'balancing metadata' at each node. In a
RedBlack tree, this is a single bit (red or black). Every operation on
these trees, including search, inserting, deleting, and rebalancing,
requires a worst-case effort of O(log N).

Any (acyclic) recursive data structure can be Merkle-ized, simply by
adding a hash of the child node alongside each link/pointer. This way,
you can verify the data for each node very naturally, as you traverse
the structure.

In fact, as long as a lite-client knows the O(1) root hash, the rest
of the storage burden can be delegated to an untrusted helper server.
Suppose a lite-client wants to insert and rebalance its tree. This
requires accessing at most O(log N) nodes. The client can request only
the data relevant to these nodes, and it knows the hash for each chunk
of data in advance of accessing it. After computing the updated root
hash, the client can even discard the data it processed.

This technique has been well discussed in the academic literature,
e.g. [1,2], although since I am not aware of any existing
implementation, I made my own, intended as an explanatory aid:
https://github.com/amiller/redblackmerkle/blob/master/redblack.py


[1] Certificate Revocation and Update
Naor and Nissim. 1998

http://static.usenix.org/publications/library/proceedings/sec98/full_papers/nissim/nissim.pdf

[2] A General Model for Authenticated Data Structures
Martel, Nuckolls, Devanbu, Michael Gertz, Kwong, Stubblebine. 2004
http://truthsayer.cs.ucdavis.edu/algorithmica.pdf

--
Andrew Miller

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] LevelDB benchmarking

2012-06-19 Thread Mike Hearn
> What problem does it solve?

Primarily that block verification and therefore propagation is too
slow because it's very CPU and IO intensive. The CPU work can be
multi-threaded. The IO work, not as much. As Bitcoin grows we need to
scale the nodes. Eventually there may be multi-machine nodes, but for
now we can buy more time by making the existing nodes faster.

I don't see this as a replacement for moving users to SPV clients.
Obviously, otherwise I would not be writing one ;)

> If the problem it will solve is the "too easy to get a DB_RUNRECOVERY
> error" because bdb is fragile when it comes to its environment... then
> LevelDB looks very interesting.

I have no experience with how robust LevelDB is. It has an API call to
try and repair the database and I know from experience that BigTable
is pretty solid. But that doesn't mean LevelDB is.

> If the problem is bdb is creaky and old and has obscure semantics and
> a hard-to-work-with API, then yes, lets switch (I'm easily seduced by
> a pretty API and blazing fast performance).

The code is a lot simpler for sure.

> As long as it compiles and runs on mac/windows/linux that doesn't
> really worry me.

It was refactored out of BigTable and made standalone for usage in
Chrome. Therefore it's as portable as Chrome is. Mac/Windows/Linux
should all work. Solaris, I believe, may need 64 bit binaries to avoid
low FD limits.

> Lack of infrastructure because it is new does worry me; for example,
> could I rework bitcointools to read the LevelDB blockchain?  (are
> there python bindings for LevelDB?)

Yes: http://code.google.com/p/py-leveldb/

First look at the code is here, but it's not ready for a pull req yet,
and I'll force push over it a few times to get it into shape. So don't
branch:

https://github.com/mikehearn/bitcoin/commit/2b601dd4a0093f834084241735d84d84e484f183

It has misc other changes I made whilst profiling, isn't well
commented enough, etc.

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] LevelDB benchmarking

2012-06-19 Thread Gavin Andresen
> OK, to make progress on this work I need a few decisions (Gavin?)
>
> 1) Shall we do it?

What problem does it solve?

If the problem it will solve is "it will only take 4 hours to download
the entire blockchain next year instead of taking 16 hours" then no, I
don't think we should do it, both 4 and 16 hours to get fully up and
running is too long.

If the problem it will solve is the "too easy to get a DB_RUNRECOVERY
error" because bdb is fragile when it comes to its environment... then
LevelDB looks very interesting.

If the problem is bdb is creaky and old and has obscure semantics and
a hard-to-work-with API, then yes, lets switch (I'm easily seduced by
a pretty API and blazing fast performance).

> 2) LevelDB is obscure, new and has a very minimalist build system. It
> supports "make" but not "make install", for example, and is unlikely
> to be packaged. It's also not very large. I suggest we just check the
> source into the main Bitcoin tree and link it statically rather than
> complicate the build.

As long as it compiles and runs on mac/windows/linux that doesn't
really worry me. I just tried it, and it compiled quickly with no
complaints on my mac.

Lack of infrastructure because it is new does worry me; for example,
could I rework bitcointools to read the LevelDB blockchain?  (are
there python bindings for LevelDB?)

> 3) As the DB format would change and a slow migration period
> necessary, any other tweaks to db format we could make at the same
> time? Right now the key/values are the same as before, though using
> satoshi serialization for everything is a bit odd.

Satoshi rolled his own network serialization because he didn't trust
existing serialization solutions to be 100% secure against remote
exploits. Then it made sense to use the same solution for disk
serialization; I don't see a compelling reason to switch to some other
serialization scheme.

Modifying the database schema during migration to better support
applications like InstaWallet (tens of thousands of separate wallets)
or something like Pieter's ultra-pruning makes sense.

-- 
--
Gavin Andresen

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] LevelDB benchmarking

2012-06-19 Thread Pieter Wuille
On Tue, Jun 19, 2012 at 11:05:20AM +0200, Mike Hearn wrote:
> OK, to make progress on this work I need a few decisions (Gavin?)
> 
> 1) Shall we do it?

I'm all for moving away from BDB. It's a very good system for what it is
intended for, but that is not how we use it. The fact that it is tied to
a database environment (but people want to copy the files themselves
between systems), that is provides consistency in case of failures (but
because we remove old log files, we still see very frequent corrupted
systems), the fact that its environments are sometimes not even forward-
compatible, ...

Assuming LevelDB is an improvement in these areas as well as resulting in
a speed improvement, I like it.

> 2) LevelDB is obscure, new and has a very minimalist build system. It
> supports "make" but not "make install", for example, and is unlikely
> to be packaged. It's also not very large. I suggest we just check the
> source into the main Bitcoin tree and link it statically rather than
> complicate the build.

How portable is LevelDB? How well tested is it? What compatibility
guarantees exist between versions of the system?

I don't mind including the source code; it doesn't seem particularly
large, and the 2-clause BSD license shouldn't be a problem.

> 3) As the DB format would change and a slow migration period
> necessary, any other tweaks to db format we could make at the same
> time? Right now the key/values are the same as before, though using
> satoshi serialization for everything is a bit odd.
> 
> We'd need UI for migration as well.

Jeff was working on splitting the database into several files earlier, and
I'm working on the database/validation logic as well. Each of these will
require a rebuild of the databases anyway. If possible, we should try to
get them in a single release, so people only need to rebuild once. 

PS: can we see the code?

-- 
Pieter

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] LevelDB benchmarking

2012-06-19 Thread Mike Hearn
+list

On Mon, Jun 18, 2012 at 9:07 PM, Gregory Maxwell  wrote:
> In addition to the ECDSA caching,  ECDSA can can easily be run on
> multiple cores for basically a linear speedup.. so even with the
> checking in place once ECDSA was using multiple threads we'd be back
> to the DB being the bottleneck for this kind of case.

Maybe ... looking again I think I may be wrong about being IO bound in
the last benchmark. The core running the main Bitcoin thread is still
pegged and the LevelDB background thread is only spending around 20%
of its time in iowait. An oprofile shows most of the time being spent
inside a std::map.

OK, to make progress on this work I need a few decisions (Gavin?)

1) Shall we do it?

2) LevelDB is obscure, new and has a very minimalist build system. It
supports "make" but not "make install", for example, and is unlikely
to be packaged. It's also not very large. I suggest we just check the
source into the main Bitcoin tree and link it statically rather than
complicate the build.

3) As the DB format would change and a slow migration period
necessary, any other tweaks to db format we could make at the same
time? Right now the key/values are the same as before, though using
satoshi serialization for everything is a bit odd.

We'd need UI for migration as well.

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development