Hi Joe,

Thanks for testing out vector search!

Cassandra 5.0 is about six months behind on vector search progress.  Part
of this is keeping up with JVector releases but more of it is core
improvements to SAI.  Unfortunately there's no easy fix for the impedance
mismatch between a field where the state of the art is improving almost
daily, and a project with a release cycle measured in years.

DataStax's cutting-edge vector search work is public and open source [1]
but it's going to be a while before we have bandwidth to upstream it to
Apache, and longer before it can be released in 5.1 or 6.0.  If you're
interested in collaborating on this, I'm happy to get you pointed in the
right direction.

In the meantime, I can also recommend trying out DataStax's Astra [2]
service, where we deploy improvements regularly.  My guesstimate is that
Astra will be 2x faster at vanilla ANN queries (with no WHERE clause) and
10x-100x faster at queries with additional predicates, depending on the
cardinality.  (As an example of what needs to be upstreamed, we added a
primitive cost-based analyzer back in January to fix the kind of timeouts
you're seeing with offset=1, and we just committed a more sophisticated one
this week [3].)

If you're stuck with 5.0, my best advice is to compact as aggressively as
possible, since SAI queries are O(N) in the number of sstables.

[1] https://github.com/datastax/cassandra/tree/vsearch
[2] https://www.datastax.com/products/datastax-astra
[3]
https://github.com/datastax/cassandra/commit/eeb33dd62b9b74ecf818a263fd73dbe6714b0df0

On Thu, Mar 21, 2024 at 9:19 AM Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> Hi All - I'd like to share some initial results for the vector search on
> Cassandra 5.0 beta1.  3 node cluster running in kubernetes; fast Netapp
> storage.
>
> Have a table (doc.embeddings_googleflan5tlarge) with definition:
>
> CREATE TABLE doc.embeddings_googleflant5large (
>      uuid text,
>      type text,
>      fieldname text,
>      offset int,
>      sourceurl text,
>      textdata text,
>      creationdate timestamp,
>      embeddings vector<float, 768>,
>      metadata boolean,
>      source text,
>      PRIMARY KEY ((uuid, type), fieldname, offset, sourceurl, textdata)
> ) WITH CLUSTERING ORDER BY (fieldname ASC, offset ASC, sourceurl ASC,
> textdata ASC)
>      AND additional_write_policy = '99p'
>      AND allow_auto_snapshot = true
>      AND bloom_filter_fp_chance = 0.01
>      AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>      AND cdc = false
>      AND comment = ''
>      AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'}
>      AND compression = {'chunk_length_in_kb': '16', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>      AND memtable = 'default'
>      AND crc_check_chance = 1.0
>      AND default_time_to_live = 0
>      AND extensions = {}
>      AND gc_grace_seconds = 864000
>      AND incremental_backups = true
>      AND max_index_interval = 2048
>      AND memtable_flush_period_in_ms = 0
>      AND min_index_interval = 128
>      AND read_repair = 'BLOCKING'
>      AND speculative_retry = '99p';
>
> CREATE CUSTOM INDEX ann_index_googleflant5large ON
> doc.embeddings_googleflant5large (embeddings) USING 'sai';
> CREATE CUSTOM INDEX offset_index_googleflant5large ON
> doc.embeddings_googleflant5large (offset) USING 'sai';
>
> nodetool status -r
>
> UN  cassandra-1.cassandra5.cassandra5-jos.svc.cluster.local 18.02 GiB
> 128     100.0% f2989dea-908b-4c06-9caa-4aacad8ba0e8  rack1
> UN  cassandra-2.cassandra5.cassandra5-jos.svc.cluster.local  17.98 GiB
> 128     100.0% ec4e506d-5f0d-475a-a3c1-aafe58399412  rack1
> UN  cassandra-0.cassandra5.cassandra5-jos.svc.cluster.local  18.16 GiB
> 128     100.0% 92c6d909-ee01-4124-ae03-3b9e2d5e74c0  rack1
>
> nodetool tablestats doc.embeddings_googleflant5large
>
> Total number of tables: 1
> ----------------
> Keyspace: doc
>          Read Count: 0
>          Read Latency: NaN ms
>          Write Count: 2893108
>          Write Latency: 326.3586520174843 ms
>          Pending Flushes: 0
>                  Table: embeddings_googleflant5large
>                  SSTable count: 6
>                  Old SSTable count: 0
>                  Max SSTable size: 5.108GiB
>                  Space used (live): 19318114423
>                  Space used (total): 19318114423
>                  Space used by snapshots (total): 0
>                  Off heap memory used (total): 4874912
>                  SSTable Compression Ratio: 0.97448
>                  Number of partitions (estimate): 58399
>                  Memtable cell count: 0
>                  Memtable data size: 0
>                  Memtable off heap memory used: 0
>                  Memtable switch count: 16
>                  Speculative retries: 0
>                  Local read count: 0
>                  Local read latency: NaN ms
>                  Local write count: 2893108
>                  Local write latency: NaN ms
>                  Local read/write ratio: 0.00000
>                  Pending flushes: 0
>                  Percent repaired: 100.0
>                  Bytes repaired: 9.066GiB
>                  Bytes unrepaired: 0B
>                  Bytes pending repair: 0B
>                  Bloom filter false positives: 7245
>                  Bloom filter false ratio: 0.00286
>                  Bloom filter space used: 87264
>                  Bloom filter off heap memory used: 87216
>                  Index summary off heap memory used: 34624
>                  Compression metadata off heap memory used: 4753072
>                  Compacted partition minimum bytes: 2760
>                  Compacted partition maximum bytes: 4866323
>                  Compacted partition mean bytes: 154523
>                  Average live cells per slice (last five minutes): NaN
>                  Maximum live cells per slice (last five minutes): 0
>                  Average tombstones per slice (last five minutes): NaN
>                  Maximum tombstones per slice (last five minutes): 0
>                  Droppable tombstone ratio: 0.00000
>
> nodetool tablehistograms doc.embeddings_googleflant5large
>
> doc/embeddings_googleflant5large histograms
> Percentile      Read Latency     Write Latency          SSTables
> Partition Size        Cell Count
>                      (micros) (micros)                             (bytes)
> 50%                     0.00              0.00 0.00
> 105778               124
> 75%                     0.00              0.00 0.00
> 182785               215
> 95%                     0.00              0.00 0.00
> 379022               446
> 98%                     0.00              0.00 0.00
> 545791               642
> 99%                     0.00              0.00 0.00
> 654949               924
> Min                     0.00              0.00 0.00
> 2760                 4
> Max                     0.00              0.00 0.00
> 4866323              5722
>
> Running a query such as:
>
> select uuid,offset,type,textdata from doc.embeddings_googleflant5large
> order by embeddings ANN OF [768 dimension vector] limit 20;
>
> Works fine - typically less than 5 seconds to return.  Subsequent
> queries are even faster.  If I'm activity adding data to the table, the
> searches can sometimes timeout (using cqlsh).
> If I add something to the where clause, the performance drops
> significantly:
>
> select uuid,offset,type,textdata from doc.embeddings_googleflant5large
> where offset=1 order by embeddings ANN OF [] limit 20;
>
> That query will timeout when running in cqlsh and with no data being
> added to the table.
> We've been running a Weaviate database side-by-side with Cassandra 4,
> and would love to drop Weaviate if we can do all the vector searches
> inside of Cassandra.
> What else can I try?  Anything to increase performance?
> Thanks all!
>
> -Joe
>
>
> --
> This email has been checked for viruses by AVG antivirus software.
> www.avg.com
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced

Reply via email to