[jira] [Commented] (CASSANDRA-18715) Add support for vector search in SAI

2023-10-30 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781097#comment-17781097
 ] 

Andres de la Peña commented on CASSANDRA-18715:
---

Fix committed to 5.0 as 
[c4d11c4372906ae1dea9e6c31c1136f122e8a1b2|https://github.com/apache/cassandra/commit/c4d11c4372906ae1dea9e6c31c1136f122e8a1b2]
 and merged to 
[{{trunk}}|https://github.com/apache/cassandra/commit/d6159cfe151316964918407b2b4099d5678892eb].

Thanks for the super-quick fix.

> Add support for vector search in SAI
> 
>
> Key: CASSANDRA-18715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18715
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Vector Search
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 5.0-alpha2, 5.0, 5.1
>
> Attachments: signature.asc
>
>  Time Spent: 25h 40m
>  Remaining Estimate: 0h
>
> The patch associated with this ticket adds a new vector index to SAI. This 
> introduces the following new elements and changes to SAI:
>  * VectorMemtableIndex - the in-memory representation of the vector indexes 
> that writes data to a DiskANN instance
>  * VectorSegmentBuilder - that writes a DiskANN graph to the following 
> on-disk components:
>  ** VECTOR - contains the floating point vectors associated with the graph
>  ** TERMS - contains the HNSW graph on-disk representation written by a 
> HnswGraphWriter
>  ** POSTINGS - contains the index postings as written by a 
> VectorPostingsWriter
>  * VectorIndexSegmentSearcher - used to search the on-disk DiskANN graph



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18715) Add support for vector search in SAI

2023-10-30 Thread Mike Adamson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781092#comment-17781092
 ] 

Mike Adamson commented on CASSANDRA-18715:
--

[~adelapena] Yes, I would prefer that. I am actively working on the distributed 
test failures. I will have a look at the trunk failures as well.

> Add support for vector search in SAI
> 
>
> Key: CASSANDRA-18715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18715
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Vector Search
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 5.0-alpha2, 5.0, 5.1
>
> Attachments: signature.asc
>
>  Time Spent: 25h 40m
>  Remaining Estimate: 0h
>
> The patch associated with this ticket adds a new vector index to SAI. This 
> introduces the following new elements and changes to SAI:
>  * VectorMemtableIndex - the in-memory representation of the vector indexes 
> that writes data to a DiskANN instance
>  * VectorSegmentBuilder - that writes a DiskANN graph to the following 
> on-disk components:
>  ** VECTOR - contains the floating point vectors associated with the graph
>  ** TERMS - contains the HNSW graph on-disk representation written by a 
> HnswGraphWriter
>  ** POSTINGS - contains the index postings as written by a 
> VectorPostingsWriter
>  * VectorIndexSegmentSearcher - used to search the on-disk DiskANN graph



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18715) Add support for vector search in SAI

2023-10-30 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781084#comment-17781084
 ] 

Andres de la Peña commented on CASSANDRA-18715:
---

The runs have finished. 5.0 is perfectly clean. {{trunk}} however shows 
failures for:
 * {{VectorDistributedTest.rangeRestrictedTest}}
 * {{VectorDistributedTest.testPartitionRestrictedVectorSearch}}
 * {{CQLVectorTest.sandwichBetweenUDTs}}
 * {{VectorSegmentationTest.testMultipleSegmentsForCompaction}}

I think the only known one was 
{{{}VectorDistributedTest.rangeRestrictedTest{}}}. The others are well below 1% 
flakiness.

[~mike_tr_adamson] is it ok to commit the current fix for 
{{VectorUpdateDeleteTest}} and deal with the others on a separate ticket?
 

> Add support for vector search in SAI
> 
>
> Key: CASSANDRA-18715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18715
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Vector Search
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 5.0-alpha2, 5.0, 5.1
>
> Attachments: signature.asc
>
>  Time Spent: 25h 40m
>  Remaining Estimate: 0h
>
> The patch associated with this ticket adds a new vector index to SAI. This 
> introduces the following new elements and changes to SAI:
>  * VectorMemtableIndex - the in-memory representation of the vector indexes 
> that writes data to a DiskANN instance
>  * VectorSegmentBuilder - that writes a DiskANN graph to the following 
> on-disk components:
>  ** VECTOR - contains the floating point vectors associated with the graph
>  ** TERMS - contains the HNSW graph on-disk representation written by a 
> HnswGraphWriter
>  ** POSTINGS - contains the index postings as written by a 
> VectorPostingsWriter
>  * VectorIndexSegmentSearcher - used to search the on-disk DiskANN graph



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18715) Add support for vector search in SAI

2023-10-30 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781072#comment-17781072
 ] 

Andres de la Peña commented on CASSANDRA-18715:
---

Ah, the NEWS.txt entry is there for 5.0, but it's missing in trunk.

> Add support for vector search in SAI
> 
>
> Key: CASSANDRA-18715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18715
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Vector Search
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 5.0-alpha2, 5.0, 5.1
>
> Attachments: signature.asc
>
>  Time Spent: 25h 40m
>  Remaining Estimate: 0h
>
> The patch associated with this ticket adds a new vector index to SAI. This 
> introduces the following new elements and changes to SAI:
>  * VectorMemtableIndex - the in-memory representation of the vector indexes 
> that writes data to a DiskANN instance
>  * VectorSegmentBuilder - that writes a DiskANN graph to the following 
> on-disk components:
>  ** VECTOR - contains the floating point vectors associated with the graph
>  ** TERMS - contains the HNSW graph on-disk representation written by a 
> HnswGraphWriter
>  ** POSTINGS - contains the index postings as written by a 
> VectorPostingsWriter
>  * VectorIndexSegmentSearcher - used to search the on-disk DiskANN graph



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18715) Add support for vector search in SAI

2023-10-30 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781070#comment-17781070
 ] 

Andres de la Peña commented on CASSANDRA-18715:
---

We are also missing [the {{NEWS.txt}} 
entry|https://github.com/apache/cassandra/pull/2673#pullrequestreview-1701435829].
 I'll add it to the hotfix.

> Add support for vector search in SAI
> 
>
> Key: CASSANDRA-18715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18715
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Vector Search
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 5.0-alpha2, 5.0, 5.1
>
> Attachments: signature.asc
>
>  Time Spent: 25h 40m
>  Remaining Estimate: 0h
>
> The patch associated with this ticket adds a new vector index to SAI. This 
> introduces the following new elements and changes to SAI:
>  * VectorMemtableIndex - the in-memory representation of the vector indexes 
> that writes data to a DiskANN instance
>  * VectorSegmentBuilder - that writes a DiskANN graph to the following 
> on-disk components:
>  ** VECTOR - contains the floating point vectors associated with the graph
>  ** TERMS - contains the HNSW graph on-disk representation written by a 
> HnswGraphWriter
>  ** POSTINGS - contains the index postings as written by a 
> VectorPostingsWriter
>  * VectorIndexSegmentSearcher - used to search the on-disk DiskANN graph



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18715) Add support for vector search in SAI

2023-10-30 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781049#comment-17781049
 ] 

Andres de la Peña commented on CASSANDRA-18715:
---

Sure, I'll merge it as a hotfix once the CI runs above finish.

> Add support for vector search in SAI
> 
>
> Key: CASSANDRA-18715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18715
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Vector Search
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 5.0-alpha2, 5.0, 5.1
>
> Attachments: signature.asc
>
>  Time Spent: 25h 40m
>  Remaining Estimate: 0h
>
> The patch associated with this ticket adds a new vector index to SAI. This 
> introduces the following new elements and changes to SAI:
>  * VectorMemtableIndex - the in-memory representation of the vector indexes 
> that writes data to a DiskANN instance
>  * VectorSegmentBuilder - that writes a DiskANN graph to the following 
> on-disk components:
>  ** VECTOR - contains the floating point vectors associated with the graph
>  ** TERMS - contains the HNSW graph on-disk representation written by a 
> HnswGraphWriter
>  ** POSTINGS - contains the index postings as written by a 
> VectorPostingsWriter
>  * VectorIndexSegmentSearcher - used to search the on-disk DiskANN graph



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18715) Add support for vector search in SAI

2023-10-30 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781039#comment-17781039
 ] 

Andres de la Peña commented on CASSANDRA-18715:
---

It seems this was committed over the weekend without a final CI round including 
the last commit added on Thursday. That commit included the test that we have 
seen failing on Jenkins. It also included a few new unit tests that we have not 
run in the multiplexer.

Here are runs for all the tests with both j11 and j17, including the last fix:
|[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/3271/workflows/517b8893-9eeb-42df-a535-4bce388692d2]|[j17|https://app.circleci.com/pipelines/github/adelapena/cassandra/3271/workflows/f41dbf15-d2ba-4b87-a621-ec4c50f5f9a7]|
|[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/3272/workflows/cb7592f2-5bba-4957-8e33-55e164a2aab0]|[j17|https://app.circleci.com/pipelines/github/adelapena/cassandra/3272/workflows/0d060872-34d8-43cb-b397-90ea0f3f3255]|

> Add support for vector search in SAI
> 
>
> Key: CASSANDRA-18715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18715
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Vector Search
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 5.0-alpha2, 5.0, 5.1
>
>  Time Spent: 25h 40m
>  Remaining Estimate: 0h
>
> The patch associated with this ticket adds a new vector index to SAI. This 
> introduces the following new elements and changes to SAI:
>  * VectorMemtableIndex - the in-memory representation of the vector indexes 
> that writes data to a DiskANN instance
>  * VectorSegmentBuilder - that writes a DiskANN graph to the following 
> on-disk components:
>  ** VECTOR - contains the floating point vectors associated with the graph
>  ** TERMS - contains the HNSW graph on-disk representation written by a 
> HnswGraphWriter
>  ** POSTINGS - contains the index postings as written by a 
> VectorPostingsWriter
>  * VectorIndexSegmentSearcher - used to search the on-disk DiskANN graph



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18715) Add support for vector search in SAI

2023-10-30 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781020#comment-17781020
 ] 

Ekaterina Dimitrova commented on CASSANDRA-18715:
-

We need to run both JDK version's test suites before committing. I can see on 
the ticket published only JDK11. Please consider it for future tickets. 

Also, thank you for all the work done here! I am excited to see the feature 
ready!

> Add support for vector search in SAI
> 
>
> Key: CASSANDRA-18715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18715
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Vector Search
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 5.0-alpha2, 5.0, 5.1
>
>  Time Spent: 25h 40m
>  Remaining Estimate: 0h
>
> The patch associated with this ticket adds a new vector index to SAI. This 
> introduces the following new elements and changes to SAI:
>  * VectorMemtableIndex - the in-memory representation of the vector indexes 
> that writes data to a DiskANN instance
>  * VectorSegmentBuilder - that writes a DiskANN graph to the following 
> on-disk components:
>  ** VECTOR - contains the floating point vectors associated with the graph
>  ** TERMS - contains the HNSW graph on-disk representation written by a 
> HnswGraphWriter
>  ** POSTINGS - contains the index postings as written by a 
> VectorPostingsWriter
>  * VectorIndexSegmentSearcher - used to search the on-disk DiskANN graph



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18715) Add support for vector search in SAI

2023-10-30 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780985#comment-17780985
 ] 

Stefan Miklosovic commented on CASSANDRA-18715:
---

OK. CI's and local testing look fine. I am going to merge it to 5.0 and trunk 
shortly.

> Add support for vector search in SAI
> 
>
> Key: CASSANDRA-18715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18715
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Vector Search
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 5.0-alpha2, 5.0, 5.1
>
>  Time Spent: 25h 40m
>  Remaining Estimate: 0h
>
> The patch associated with this ticket adds a new vector index to SAI. This 
> introduces the following new elements and changes to SAI:
>  * VectorMemtableIndex - the in-memory representation of the vector indexes 
> that writes data to a DiskANN instance
>  * VectorSegmentBuilder - that writes a DiskANN graph to the following 
> on-disk components:
>  ** VECTOR - contains the floating point vectors associated with the graph
>  ** TERMS - contains the HNSW graph on-disk representation written by a 
> HnswGraphWriter
>  ** POSTINGS - contains the index postings as written by a 
> VectorPostingsWriter
>  * VectorIndexSegmentSearcher - used to search the on-disk DiskANN graph



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18715) Add support for vector search in SAI

2023-10-30 Thread Mike Adamson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780980#comment-17780980
 ] 

Mike Adamson commented on CASSANDRA-18715:
--

[~smiklosovic] I would prefer to merge this one because it is a failure for 
every run and put a separate patch for the rangeRestrictedTest. That is a 
random test failure and is going to take a little longer to fix. I am actively 
working on it, so hopefully not too long.

> Add support for vector search in SAI
> 
>
> Key: CASSANDRA-18715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18715
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Vector Search
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 5.0-alpha2, 5.0, 5.1
>
>  Time Spent: 25h 40m
>  Remaining Estimate: 0h
>
> The patch associated with this ticket adds a new vector index to SAI. This 
> introduces the following new elements and changes to SAI:
>  * VectorMemtableIndex - the in-memory representation of the vector indexes 
> that writes data to a DiskANN instance
>  * VectorSegmentBuilder - that writes a DiskANN graph to the following 
> on-disk components:
>  ** VECTOR - contains the floating point vectors associated with the graph
>  ** TERMS - contains the HNSW graph on-disk representation written by a 
> HnswGraphWriter
>  ** POSTINGS - contains the index postings as written by a 
> VectorPostingsWriter
>  * VectorIndexSegmentSearcher - used to search the on-disk DiskANN graph



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18715) Add support for vector search in SAI

2023-10-30 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780978#comment-17780978
 ] 

Michael Semb Wever commented on CASSANDRA-18715:


+1 of VectorUpdateDeleteTest fix. code changes make sense. and confirmed that 
tests are fixed with it.

> Add support for vector search in SAI
> 
>
> Key: CASSANDRA-18715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18715
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Vector Search
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 5.0-alpha2, 5.0, 5.1
>
>  Time Spent: 25h 40m
>  Remaining Estimate: 0h
>
> The patch associated with this ticket adds a new vector index to SAI. This 
> introduces the following new elements and changes to SAI:
>  * VectorMemtableIndex - the in-memory representation of the vector indexes 
> that writes data to a DiskANN instance
>  * VectorSegmentBuilder - that writes a DiskANN graph to the following 
> on-disk components:
>  ** VECTOR - contains the floating point vectors associated with the graph
>  ** TERMS - contains the HNSW graph on-disk representation written by a 
> HnswGraphWriter
>  ** POSTINGS - contains the index postings as written by a 
> VectorPostingsWriter
>  * VectorIndexSegmentSearcher - used to search the on-disk DiskANN graph



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18715) Add support for vector search in SAI

2023-10-30 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780976#comment-17780976
 ] 

Stefan Miklosovic commented on CASSANDRA-18715:
---

Thank you [~mike_tr_adamson] for the fast fix.

+1 on that particular patch you just posted.

Do we want to wait for another one too so it might be committed all at once and 
we are done or do you prefer to merge it one by one? (VectorDistributedTest  / 
rangeRestrictedTest)

> Add support for vector search in SAI
> 
>
> Key: CASSANDRA-18715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18715
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Vector Search
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 5.0-alpha2, 5.0, 5.1
>
>  Time Spent: 25h 40m
>  Remaining Estimate: 0h
>
> The patch associated with this ticket adds a new vector index to SAI. This 
> introduces the following new elements and changes to SAI:
>  * VectorMemtableIndex - the in-memory representation of the vector indexes 
> that writes data to a DiskANN instance
>  * VectorSegmentBuilder - that writes a DiskANN graph to the following 
> on-disk components:
>  ** VECTOR - contains the floating point vectors associated with the graph
>  ** TERMS - contains the HNSW graph on-disk representation written by a 
> HnswGraphWriter
>  ** POSTINGS - contains the index postings as written by a 
> VectorPostingsWriter
>  * VectorIndexSegmentSearcher - used to search the on-disk DiskANN graph



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18715) Add support for vector search in SAI

2023-10-30 Thread Mike Adamson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780972#comment-17780972
 ] 

Mike Adamson commented on CASSANDRA-18715:
--

PRs and CI for fixing VectorUpdateDeleteTest are here:

|[5.0|https://github.com/apache/cassandra/pull/2850]|[CI|https://app.circleci.com/pipelines/github/mike-tr-adamson/cassandra/357/workflows/46d6ab84-a2e9-4bcd-8a08-3ae2089476c9]|

> Add support for vector search in SAI
> 
>
> Key: CASSANDRA-18715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18715
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Vector Search
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 5.0-alpha2, 5.0, 5.1
>
>  Time Spent: 25h 40m
>  Remaining Estimate: 0h
>
> The patch associated with this ticket adds a new vector index to SAI. This 
> introduces the following new elements and changes to SAI:
>  * VectorMemtableIndex - the in-memory representation of the vector indexes 
> that writes data to a DiskANN instance
>  * VectorSegmentBuilder - that writes a DiskANN graph to the following 
> on-disk components:
>  ** VECTOR - contains the floating point vectors associated with the graph
>  ** TERMS - contains the HNSW graph on-disk representation written by a 
> HnswGraphWriter
>  ** POSTINGS - contains the index postings as written by a 
> VectorPostingsWriter
>  * VectorIndexSegmentSearcher - used to search the on-disk DiskANN graph



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18715) Add support for vector search in SAI

2023-10-30 Thread Mike Adamson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780932#comment-17780932
 ] 

Mike Adamson commented on CASSANDRA-18715:
--

[~smiklosovic] [~mck] 

I am testing a patch for this. I will post the patch here with test runs when 
the test runs are complete.

> Add support for vector search in SAI
> 
>
> Key: CASSANDRA-18715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18715
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Vector Search
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 5.0-alpha2, 5.0, 5.1
>
>  Time Spent: 25.5h
>  Remaining Estimate: 0h
>
> The patch associated with this ticket adds a new vector index to SAI. This 
> introduces the following new elements and changes to SAI:
>  * VectorMemtableIndex - the in-memory representation of the vector indexes 
> that writes data to a DiskANN instance
>  * VectorSegmentBuilder - that writes a DiskANN graph to the following 
> on-disk components:
>  ** VECTOR - contains the floating point vectors associated with the graph
>  ** TERMS - contains the HNSW graph on-disk representation written by a 
> HnswGraphWriter
>  ** POSTINGS - contains the index postings as written by a 
> VectorPostingsWriter
>  * VectorIndexSegmentSearcher - used to search the on-disk DiskANN graph



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18715) Add support for vector search in SAI

2023-10-30 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780901#comment-17780901
 ] 

Michael Semb Wever commented on CASSANDRA-18715:


Easily reproduced with 
```
.build/docker/run-tests.sh test "VectorUpdateDeleteTest" 17
```

> Add support for vector search in SAI
> 
>
> Key: CASSANDRA-18715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18715
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Vector Search
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 5.0-alpha2, 5.0, 5.1
>
>  Time Spent: 25.5h
>  Remaining Estimate: 0h
>
> The patch associated with this ticket adds a new vector index to SAI. This 
> introduces the following new elements and changes to SAI:
>  * VectorMemtableIndex - the in-memory representation of the vector indexes 
> that writes data to a DiskANN instance
>  * VectorSegmentBuilder - that writes a DiskANN graph to the following 
> on-disk components:
>  ** VECTOR - contains the floating point vectors associated with the graph
>  ** TERMS - contains the HNSW graph on-disk representation written by a 
> HnswGraphWriter
>  ** POSTINGS - contains the index postings as written by a 
> VectorPostingsWriter
>  * VectorIndexSegmentSearcher - used to search the on-disk DiskANN graph



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18715) Add support for vector search in SAI

2023-10-30 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780892#comment-17780892
 ] 

Stefan Miklosovic commented on CASSANDRA-18715:
---

I see couple failure which seem to be related:

https://ci-cassandra.apache.org/job/Cassandra-trunk/1758/#showFailuresLink

org.apache.cassandra.index.sai.cql.VectorUpdateDeleteTest.ensureVariableChunkSizeDoesNotLeadToIncorrectResults-cdc.jdk17.arch=x86_64.python2.7
org.apache.cassandra.index.sai.cql.VectorUpdateDeleteTest.ensureVariableChunkSizeDoesNotLeadToIncorrectResults-compression.jdk17.arch=x86_64.python2.7
org.apache.cassandra.index.sai.cql.VectorUpdateDeleteTest.ensureVariableChunkSizeDoesNotLeadToIncorrectResults-.jdk17.arch=x86_64.python2.7

like

{code}
java.lang.NoSuchFieldException: modifiers
at java.base/java.lang.Class.getDeclaredField(Class.java:2610)
at 
org.apache.cassandra.index.sai.cql.VectorUpdateDeleteTest.setChunkSize(VectorUpdateDeleteTest.java:556)
at 
org.apache.cassandra.index.sai.cql.VectorUpdateDeleteTest.ensureVariableChunkSizeDoesNotLeadToIncorrectResults(VectorUpdateDeleteTest.java:548)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
{code}

cc [~adelapena] [~mike_tr_adamson]

> Add support for vector search in SAI
> 
>
> Key: CASSANDRA-18715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18715
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Vector Search
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 5.0-alpha2, 5.0, 5.1
>
>  Time Spent: 25.5h
>  Remaining Estimate: 0h
>
> The patch associated with this ticket adds a new vector index to SAI. This 
> introduces the following new elements and changes to SAI:
>  * VectorMemtableIndex - the in-memory representation of the vector indexes 
> that writes data to a DiskANN instance
>  * VectorSegmentBuilder - that writes a DiskANN graph to the following 
> on-disk components:
>  ** VECTOR - contains the floating point vectors associated with the graph
>  ** TERMS - contains the HNSW graph on-disk representation written by a 
> HnswGraphWriter
>  ** POSTINGS - contains the index postings as written by a 
> VectorPostingsWriter
>  * VectorIndexSegmentSearcher - used to search the on-disk DiskANN graph



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18715) Add support for vector search in SAI

2023-10-27 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780448#comment-17780448
 ] 

Michael Semb Wever commented on CASSANDRA-18715:


Reviews have approved on the 5.0 PR.  Agreed with [~mike_tr_adamson] to commit 
this and we'll file the flakey as a separate ticket (it has been identified and 
is already been worked on).

> Add support for vector search in SAI
> 
>
> Key: CASSANDRA-18715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18715
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Vector Search
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 5.0-beta, 5.x
>
>  Time Spent: 25h
>  Remaining Estimate: 0h
>
> The patch associated with this ticket adds a new vector index to SAI. This 
> introduces the following new elements and changes to SAI:
>  * VectorMemtableIndex - the in-memory representation of the vector indexes 
> that writes data to a DiskANN instance
>  * VectorSegmentBuilder - that writes a DiskANN graph to the following 
> on-disk components:
>  ** VECTOR - contains the floating point vectors associated with the graph
>  ** TERMS - contains the HNSW graph on-disk representation written by a 
> HnswGraphWriter
>  ** POSTINGS - contains the index postings as written by a 
> VectorPostingsWriter
>  * VectorIndexSegmentSearcher - used to search the on-disk DiskANN graph



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18715) Add support for vector search in SAI

2023-10-26 Thread Jonathan Ellis (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779948#comment-17779948
 ] 

Jonathan Ellis commented on CASSANDRA-18715:


As the primary author of the new JVector dependency, I can also verify that my 
code, while not a contribution to ASF, is my original work. It also includes 
complete details of any third-party licenses or restrictions I am aware of, in 
line with the spirit of clauses #5 and #7 of the ASF ICLA.

> Add support for vector search in SAI
> 
>
> Key: CASSANDRA-18715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18715
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Vector Search
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 5.0-beta, 5.x
>
>  Time Spent: 24h 50m
>  Remaining Estimate: 0h
>
> The patch associated with this ticket adds a new vector index to SAI. This 
> introduces the following new elements and changes to SAI:
>  * VectorMemtableIndex - the in-memory representation of the vector indexes 
> that writes data to a DiskANN instance
>  * VectorSegmentBuilder - that writes a DiskANN graph to the following 
> on-disk components:
>  ** VECTOR - contains the floating point vectors associated with the graph
>  ** TERMS - contains the HNSW graph on-disk representation written by a 
> HnswGraphWriter
>  ** POSTINGS - contains the index postings as written by a 
> VectorPostingsWriter
>  * VectorIndexSegmentSearcher - used to search the on-disk DiskANN graph



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18715) Add support for vector search in SAI

2023-10-26 Thread Mike Adamson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779830#comment-17779830
 ] 

Mike Adamson commented on CASSANDRA-18715:
--

The contributors of the patch associated with this ticket wish to make it clear 
that the contribution adheres to clauses #5 & #7 of the Apache Foundation 
[ICLA|https://www.apache.org/licenses/icla.pdf].

> Add support for vector search in SAI
> 
>
> Key: CASSANDRA-18715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18715
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Vector Search
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 5.0-beta, 5.x
>
>  Time Spent: 24h 20m
>  Remaining Estimate: 0h
>
> The patch associated with this ticket adds a new vector index to SAI. This 
> introduces the following new elements and changes to SAI:
>  * VectorMemtableIndex - the in-memory representation of the vector indexes 
> that writes data to a DiskANN instance
>  * VectorSegmentBuilder - that writes a DiskANN graph to the following 
> on-disk components:
>  ** VECTOR - contains the floating point vectors associated with the graph
>  ** TERMS - contains the HNSW graph on-disk representation written by a 
> HnswGraphWriter
>  ** POSTINGS - contains the index postings as written by a 
> VectorPostingsWriter
>  * VectorIndexSegmentSearcher - used to search the on-disk DiskANN graph



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18715) Add support for vector search in SAI

2023-10-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779821#comment-17779821
 ] 

Andres de la Peña commented on CASSANDRA-18715:
---

CI looks good, although it seems one of the trunk repeated runs of 
{{VectorDistributedTest}} has failed on {{{}rangeRestrictedTest{}}}. Output 
file 
[here|https://output.circle-artifacts.com/output/job/4c433c82-c048-4776-9c3f-ee6656f94d24/artifacts/5/stdout/fails/17/org.apache.cassandra.distributed.test.sai.VectorDistributedTest.txt].

> Add support for vector search in SAI
> 
>
> Key: CASSANDRA-18715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18715
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Vector Search
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 5.0-beta, 5.x
>
>  Time Spent: 24h 20m
>  Remaining Estimate: 0h
>
> The patch associated with this ticket adds a new vector index to SAI. This 
> introduces the following new elements and changes to SAI:
>  * VectorMemtableIndex - the in-memory representation of the vector indexes 
> that writes data to a DiskANN instance
>  * VectorSegmentBuilder - that writes a DiskANN graph to the following 
> on-disk components:
>  ** VECTOR - contains the floating point vectors associated with the graph
>  ** TERMS - contains the HNSW graph on-disk representation written by a 
> HnswGraphWriter
>  ** POSTINGS - contains the index postings as written by a 
> VectorPostingsWriter
>  * VectorIndexSegmentSearcher - used to search the on-disk DiskANN graph



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18715) Add support for vector search in SAI

2023-10-25 Thread Mike Adamson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779562#comment-17779562
 ] 

Mike Adamson commented on CASSANDRA-18715:
--

The latest CI runs are here:
|[5.0|https://github.com/apache/cassandra/pull/2673]|[CI|https://app.circleci.com/pipelines/github/mike-tr-adamson/cassandra/348/workflows/982fd591-c53a-4aa6-9a95-68f10df6bfae]|
|[trunk|https://github.com/apache/cassandra/pull/2765]|[CI|https://app.circleci.com/pipelines/github/mike-tr-adamson/cassandra/349/workflows/20dc9c0d-695a-4f85-b2e4-a568bac06bc6]|

> Add support for vector search in SAI
> 
>
> Key: CASSANDRA-18715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18715
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Vector Search
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 5.0-beta, 5.x
>
>  Time Spent: 24h 10m
>  Remaining Estimate: 0h
>
> The patch associated with this ticket adds a new vector index to SAI. This 
> introduces the following new elements and changes to SAI:
>  * VectorMemtableIndex - the in-memory representation of the vector indexes 
> that writes data to a DiskANN instance
>  * VectorSegmentBuilder - that writes a DiskANN graph to the following 
> on-disk components:
>  ** VECTOR - contains the floating point vectors associated with the graph
>  ** TERMS - contains the HNSW graph on-disk representation written by a 
> HnswGraphWriter
>  ** POSTINGS - contains the index postings as written by a 
> VectorPostingsWriter
>  * VectorIndexSegmentSearcher - used to search the on-disk DiskANN graph



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18715) Add support for vector search in SAI

2023-09-11 Thread Mike Adamson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763728#comment-17763728
 ] 

Mike Adamson commented on CASSANDRA-18715:
--

|[5.0|https://github.com/apache/cassandra/pull/2673]|[CI|https://app.circleci.com/pipelines/github/mike-tr-adamson/cassandra/288/workflows/e102cc8b-303a-400e-8265-f49cadf08eb5/jobs/19867]|

This is an initial test run. I will continue to fix flakeys.

> Add support for vector search in SAI
> 
>
> Key: CASSANDRA-18715
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18715
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Vector Search
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org