Re: Large primary keys

2016-04-14 Thread Robert Wille
That would be a nice solution, but 3.4 is way too bleeding edge. I’ll just go with the digest for now. Thanks for pointing it out. I’ll have to consider a migration in the future when production is on 3.x. On Apr 11, 2016, at 10:19 PM, Jack Krupansky

Re: Large primary keys

2016-04-11 Thread Jack Krupansky
Check out the text indexing feature of the new SASI feature in Cassandra 3.4. You could write a custom tokenizer to extract entities and then be able to query for documents that contain those entities. That said, using a SHA digest key for the primary key has merit for direct access to the

Re: Large primary keys

2016-04-11 Thread James Carman
S3 maybe? On Mon, Apr 11, 2016 at 7:05 PM Robert Wille wrote: > I do realize its kind of a weird use case, but it is legitimate. I have a > collection of documents that I need to index, and I want to perform entity > extraction on them and give the extracted entities special

Re: Large primary keys

2016-04-11 Thread Robert Wille
I do realize its kind of a weird use case, but it is legitimate. I have a collection of documents that I need to index, and I want to perform entity extraction on them and give the extracted entities special treatment in my full-text index. Because entity extraction costs money, and each

Re: Large primary keys

2016-04-11 Thread Jan Kesten
Hi Robert, why do you need the actual text as a key? I sounds a bit unatural at least for me. Keep in mind that you cannot do "like" queries on keys in cassandra. For performance and keeping things more readable I would prefer hashing your text and use the hash as key. You should also take

Re: Large primary keys

2016-04-11 Thread James Carman
Why does the text need to be the key? On Mon, Apr 11, 2016 at 6:04 PM Robert Wille wrote: > I have a need to be able to use the text of a document as the primary key > in a table. These texts are usually less than 1K, but can sometimes be 10’s > of K’s in size. Would it be

Re: Large primary keys

2016-04-11 Thread Bryan Cheng
While large primary keys (within reason) should work, IMO anytime you're doing equality testing you are really better off minimizing the size of the key. Huge primary keys will also have very negative impacts on your key cache. I would err on the side of the digest, but I've never had a need for