That would be a nice solution, but 3.4 is way too bleeding edge. I’ll just go
with the digest for now. Thanks for pointing it out. I’ll have to consider a
migration in the future when production is on 3.x.
On Apr 11, 2016, at 10:19 PM, Jack Krupansky
Check out the text indexing feature of the new SASI feature in Cassandra
3.4. You could write a custom tokenizer to extract entities and then be
able to query for documents that contain those entities.
That said, using a SHA digest key for the primary key has merit for direct
access to the
S3 maybe?
On Mon, Apr 11, 2016 at 7:05 PM Robert Wille wrote:
> I do realize its kind of a weird use case, but it is legitimate. I have a
> collection of documents that I need to index, and I want to perform entity
> extraction on them and give the extracted entities special
I do realize its kind of a weird use case, but it is legitimate. I have a
collection of documents that I need to index, and I want to perform entity
extraction on them and give the extracted entities special treatment in my
full-text index. Because entity extraction costs money, and each
Hi Robert,
why do you need the actual text as a key? I sounds a bit unatural at
least for me. Keep in mind that you cannot do "like" queries on keys in
cassandra. For performance and keeping things more readable I would
prefer hashing your text and use the hash as key.
You should also take
Why does the text need to be the key?
On Mon, Apr 11, 2016 at 6:04 PM Robert Wille wrote:
> I have a need to be able to use the text of a document as the primary key
> in a table. These texts are usually less than 1K, but can sometimes be 10’s
> of K’s in size. Would it be
While large primary keys (within reason) should work, IMO anytime you're
doing equality testing you are really better off minimizing the size of the
key. Huge primary keys will also have very negative impacts on your key
cache. I would err on the side of the digest, but I've never had a need for