On Thu, Feb 16, 2017 at 8:36 AM, Paul Brannan <[email protected]> wrote:
> Thank you for the quick response! > > I do understand the desire to not present in the API the appearance of a > feature that isn't really there. As an example, I was reading KUDU-1291 > yesterday and realized that since predicates are specified by name and not > column index, it's easy to accidentally construct an inefficient scan by > omitting the first component of the key. It's not surprising given how > kudu works, but it's not obvious just by looking at the API. > > I would like to contribute a patch, though I would like to learn more > about kudu before I feel comfortable. In this case, for example, my naive > solution may work, but isn't very robust -- since it has to switch on the > data type of each cell, it will fail when new data types are added. > I don't think that's too bad - we have lots of places where we have to add a new case label when a new type is added, so what's one more? :) > I have a hunch it's possible to copy row_data_ directly somehow, but I > have no idea how to correctly set isset_bitmap_ or owned_strings_bitmap_. > "isset_bitmap_" should probably be set based on the projected columns of the scanner. "owned_stirngs_bitmap_" indicates which strings in the row are "owned" by the PartialRow rather than "referenced". I think to make this API safe, we'd probably want to clone all of the string data, which would imply setting owned_strings_bitmap_ to 1 bits for those strings. Hope that helps -Todd > > On Wed, Feb 15, 2017 at 1:32 PM, Todd Lipcon <[email protected]> wrote: > >> Hi Paul, >> >> You're correct that there is no predicate-based delete currently >> available, so you have to scan and then feed the results of the scan back >> into your desired mutations/deletes. This is intentional, since right now >> we don't have multi-row transactional capabilities, and a "delete by >> predicate" API would probably give the false impression that it is >> transactional. >> >> I also think you're right that there isn't a nice way of propagating a >> RowResult into a PartialRow as you need to do here. It seems you're a C++ >> programmer -- any interest in contributing a patch to make this >> transformation a bit easier? >> >> -Todd >> >> >> >> On Wed, Feb 15, 2017 at 10:27 AM, Paul Brannan < >> [email protected]> wrote: >> >>> I want to delete all rows that match a particular partial key. For >>> example, if my schema includes columns "foo", "bar", and "baz" in its >>> primary key, I want to be able to delete all rows with "foo=16" and >>> "bar=32", regardless of the value of baz. If I attempt to apply a >>> KuduDelete without specifying "baz", I get an error "Illegal state: Key not >>> specified". >>> >>> The best I have come up with so far is to do a scan and copy the data >>> cell-by-cell from the RowPtr returned by the scan into the KuduPartialRow >>> used by the delete; I don't see any good way in the interface to copy row >>> data from one to the other without copying cell-by-cell. The code looks >>> something like: >>> >>> for (auto idx : primary_key_column_indexes) { >>> switch(schema.Column(idx).type()) { >>> case KuduColumnSchema::INT16: // GetInt16/SetInt16 >>> case KuduColumnSchema::INT32: // GetInt32/SetInt32 >>> case KuduColumnSchema::STRING: // GetString/SetString >>> // and so on... >>> } >>> } >>> >>> Is there a better way? >>> >>> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> > > -- Todd Lipcon Software Engineer, Cloudera
