We don't truncate the hash, we mod it. Why would you expect that data
wouldn't be evenly distributed? We've not seen this to be the case.



On Mon, Oct 21, 2013 at 1:48 PM, Michael Segel <[email protected]>wrote:

> What do you call hashing the row key?
> Or hashing the row key and then appending the row key to the hash?
> Or hashing the row key, truncating the hash value to some subset and then
> appending the row key to the value?
>
> The problem is that there is specific meaning to the term salt. Re-using
> it here will cause confusion because you're implying something you don't
> mean to imply.
>
> you could say prepend a truncated hash of the key, however… is prepend a
> real word? ;-) (I am sorry, I am not a grammar nazi, nor an English major. )
>
> So even outside of Phoenix, the concept is the same.
> Even with a truncated hash, you will find that over time, all but the tail
> N regions will only be half full.
> This could be both good and bad.
>
> (Where N is your number 8 or 16 allowable hash values.)
>
> You've solved potentially one problem… but still have other issues that
> you need to address.
> I guess the simple answer is to double the region sizes and not care that
> most of your regions will be 1/2 the max size…  but the size you really
> want and 8-16 regions will be up to twice as big.
>
>
>
> On Oct 21, 2013, at 3:26 PM, James Taylor <[email protected]> wrote:
>
> > What do you think it should be called, because
> > "prepending-row-key-with-single-hashed-byte" doesn't have a very good
> ring
> > to it. :-)
> >
> > Agree that getting the row key design right is crucial.
> >
> > The range of "prepending-row-key-with-single-hashed-byte" is declarative
> > when you create your table in Phoenix, so you typically declare an upper
> > bound based on your cluster size (not 255, but maybe 8 or 16). We've run
> > the numbers and it's typically faster, but as with most things, not
> always.
> >
> > HTH,
> > James
> >
> >
> > On Mon, Oct 21, 2013 at 1:05 PM, Michael Segel <
> [email protected]>wrote:
> >
> >> Then its not a SALT. And please don't use the term 'salt' because it has
> >> specific meaning outside to what you want it to mean.  Just like saying
> >> HBase has ACID because you write the entire row as an atomic element.
>  But
> >> I digress….
> >>
> >> Ok so to your point…
> >>
> >> 1 byte == 255 possible values.
> >>
> >> So which will be faster.
> >>
> >> creating a list of the 1 byte truncated hash of each possible timestamp
> in
> >> your range, or doing 255 separate range scans with the start and stop
> range
> >> key set?
> >>
> >> That will give you the results you want, however… I'd go back and have
> >> them possibly rethink the row key if they can … assuming this is the
> base
> >> access pattern.
> >>
> >> HTH
> >>
> >> -Mike
> >>
> >>
> >>
> >>
> >>
> >> On Oct 21, 2013, at 11:37 AM, James Taylor <[email protected]>
> wrote:
> >>
> >>> Phoenix restricts salting to a single byte.
> >>> Salting perhaps is misnamed, as the salt byte is a stable hash based on
> >> the
> >>> row key.
> >>> Phoenix's skip scan supports sub-key ranges.
> >>> We've found salting in general to be faster (though there are cases
> where
> >>> it's not), as it ensures better parallelization.
> >>>
> >>> Regards,
> >>> James
> >>>
> >>>
> >>>
> >>> On Mon, Oct 21, 2013 at 9:14 AM, Vladimir Rodionov
> >>> <[email protected]>wrote:
> >>>
> >>>> FuzzyRowFilter does not work on sub-key ranges.
> >>>> Salting is bad for any scan operation, unfortunately. When salt prefix
> >>>> cardinality is small (1-2 bytes),
> >>>> one can try something similar to FuzzyRowFilter but with additional
> >>>> sub-key range support.
> >>>> If salt prefix cardinality is high (> 2 bytes) - do a full scan with
> >> your
> >>>> own Filter (for timestamp ranges).
> >>>>
> >>>> Best regards,
> >>>> Vladimir Rodionov
> >>>> Principal Platform Engineer
> >>>> Carrier IQ, www.carrieriq.com
> >>>> e-mail: [email protected]
> >>>>
> >>>> ________________________________________
> >>>> From: Premal Shah [[email protected]]
> >>>> Sent: Sunday, October 20, 2013 10:42 PM
> >>>> To: user
> >>>> Subject: Re: row filter - binary comparator at certain range
> >>>>
> >>>> Have you looked at FuzzyRowFilter? Seems to me that it might satisfy
> >> your
> >>>> use-case.
> >>>>
> >>>>
> >>
> http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/
> >>>>
> >>>>
> >>>> On Sun, Oct 20, 2013 at 9:31 PM, Tony Duan <[email protected]>
> wrote:
> >>>>
> >>>>> Alex Vasilenko <aa.vasilenko@...> writes:
> >>>>>
> >>>>>>
> >>>>>> Lars,
> >>>>>>
> >>>>>> But how it will behave, when I have salt at the beginning of the key
> >> to
> >>>>>> properly shard table across regions? Imagine row key of format
> >>>>>> salt:timestamp and rows goes like this:
> >>>>>> ...
> >>>>>> 1:15
> >>>>>> 1:16
> >>>>>> 1:17
> >>>>>> 1:23
> >>>>>> 2:3
> >>>>>> 2:5
> >>>>>> 2:12
> >>>>>> 2:15
> >>>>>> 2:19
> >>>>>> 2:25
> >>>>>> ...
> >>>>>>
> >>>>>> And I want to find all rows, that has second part (timestamp) in
> range
> >>>>>> 15-25. What startKey and endKey should be used?
> >>>>>>
> >>>>>> Alexandr Vasilenko
> >>>>>> Web Developer
> >>>>>> Skype:menterr
> >>>>>> mob: +38097-611-45-99
> >>>>>>
> >>>>>> 2012/2/9 lars hofhansl <lhofhansl@...>
> >>>>> Hi,
> >>>>> Alexandr Vasilenko
> >>>>> Have you ever resolved this issue?i am also facing this iusse.
> >>>>> i also want implement this functionality.
> >>>>> Imagine row key of format
> >>>>> salt:timestamp and rows goes like this:
> >>>>> ...
> >>>>> 1:15
> >>>>> 1:16
> >>>>> 1:17
> >>>>> 1:23
> >>>>> 2:3
> >>>>> 2:5
> >>>>> 2:12
> >>>>> 2:15
> >>>>> 2:19
> >>>>> 2:25
> >>>>> ...
> >>>>>
> >>>>> And I want to find all rows, that has second part (timestamp) in
> range
> >>>>> 15-25.
> >>>>>
> >>>>> Could you please tell me how you resolve this ?
> >>>>> thanks  in advance.
> >>>>>
> >>>>>
> >>>>> Tony duan
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Regards,
> >>>> Premal Shah.
> >>>>
> >>>> Confidentiality Notice:  The information contained in this message,
> >>>> including any attachments hereto, may be confidential and is intended
> >> to be
> >>>> read only by the individual or entity to whom this message is
> >> addressed. If
> >>>> the reader of this message is not the intended recipient or an agent
> or
> >>>> designee of the intended recipient, please note that any review, use,
> >>>> disclosure or distribution of this message or its attachments, in any
> >> form,
> >>>> is strictly prohibited.  If you have received this message in error,
> >> please
> >>>> immediately notify the sender and/or [email protected] and
> >>>> delete or destroy any copy of this message and its attachments.
> >>>>
> >>
> >>
>
>

Reply via email to