Re: [PERFORM] Millions of tables

Greg Spiegelberg Mon, 26 Sep 2016 06:51:59 -0700

On Mon, Sep 26, 2016 at 3:43 AM, Stuart Bishop <[email protected]>
wrote:


> On 26 September 2016 at 11:19, Greg Spiegelberg <[email protected]>
> wrote:
>
>> I did look at PostgresXL and CitusDB.  Both are admirable however neither
>> could support the need to read a random record consistently under 30ms.
>> It's a similar problem Cassandra and others have: network latency.  At this
>> scale, to provide the ability to access any given record amongst trillions
>> it is imperative to know precisely where it is stored (system & database)
>> and read a relatively small index.  I have other requirements that prohibit
>> use of any technology that is eventually consistent.
>>
>> I liken the problem to fishing.  To find a particular fish of length,
>> size, color &c in a data lake you must accept the possibility of scanning
>> the entire lake.  However, if all fish were in barrels where each barrel
>> had a particular kind of fish of specific length, size, color &c then the
>> problem is far simpler.
>>
>> -Greg
>>
>
> My gut tells me that if you do solve the problem and get PostgreSQL (or
> anything) reading consistently at under 30ms with that many tables you will
> have solved one problem by creating another.
>
>
Exactly why I am exploring.  What are the trade offs?



> You discounted Cassandra due to network latency, but are now trying a
> monolithic PostgreSQL setup. It might be worth trying a single node
> ScyllaDB or Cassandra deploy (no need for QUORUM or network overhead),
> perhaps using layered compaction so all your data gets broken out into
> 160MB chunks. And certainly wander over to the ScyllaDB mailing list, as
> they are very focused on performance problems like yours and should offer
> some insight even if a Cassandra style architecture cannot meet your
> requirements.
>
>
Cassandra performance, according to the experts I consulted, starts to fall
off once the stored dataset exceeds ~3 TB.  Much too small for my use
case.  Again, I do have other reasons for not using Cassandra and others
namely deduplication of information referenced by my millions of tables.
There are no guarantees in many outside of the RDBMS realm.



> An alternative if you exhaust or don't trust other options, use a foreign
> data wrapper to access your own custom storage. A single table at the PG
> level, you can shard the data yourself into 8 bazillion separate stores, in
> whatever structure suites your read and write operations (maybe reusing an
> embedded db engine, ordered flat file+log+index, whatever).
>
>
However even 8 bazillion FDW's may cause an "overflow" of relationships at
the loss of having an efficient storage engine acting more like a traffic
cop.  In such a case, I would opt to put such logic in the app to directly
access the true storage over using FDW's.

-Greg

Re: [PERFORM] Millions of tables

Reply via email to