Re: Example Data Modelling

Carlos Alonso Tue, 07 Jul 2015 07:04:46 -0700

Hi Jerome,

Good point!! Really a nice usage of static columns! BTW, wouldn't the EmpID
be static as well?


Cheers

Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>

On 7 July 2015 at 14:42, Jérôme Mainaud <jer...@mainaud.com> wrote:

> Hello,
>
> You can slightly adapt Carlos answer to reduce repliation of data that
> don't change for month to month.
> Static columns are great for this.
>
> The table become:
>
> CREATE TABLE salaries (
>   EmpID varchar,
>   FN varchar *static*,
>   LN varchar *static*,
>   Phone varchar *static*,
>   Address varchar *static*,
>   month integer,
>   basic integer,
>   flexible_allowance float,
>   PRIMARY KEY(EmpID, month)
> )
>
> There is only one copy of static column per partition the value is shared
> between all rows of the partition.
> When Employee data change you can update it with the partition key in the
> where clause.
> When you insert a new month entry you just fill non static columns.
> The table can be queried the same way as the original one.
>
> Cheers
>
>
>
> --
> Jérôme Mainaud
> jer...@mainaud.com
>
> 2015-07-07 11:51 GMT+02:00 Rory Bramwell, DevOp Services <
> rory.bramw...@devopservices.com>:
>
>> Hi,
>>
>> I've been following this thread and my thoughts are inline with Carlos'
>> latest response... Model your data to suite your queries. That is one of
>> the data model / design considerations in Cassandra that differs from the
>> RDBMS world. Embrace demoralization and data duplication. Disk space is
>> cheapest, so exploit how your data is laid out in order to optimize for
>> faster reads (which are more costly than writes).
>>
>> Regards,
>>
>> Rory Bramwell
>> Founder and CEO
>> DevOp Services
>>
>> Skype: devopservices
>> Email: rory.bramw...@devopservices.com
>> Web: www.devopservices.com
>> On Jul 7, 2015 4:02 AM, "Carlos Alonso" <i...@mrcalonso.com> wrote:
>>
>>> I guess you're right, using my proposal, getting last employee's record
>>> is straightforward and quick, but also, as Peter pointed, getting all slips
>>> for a particular month requires you to know all the employee IDs and,
>>> ideally, run a query for each employee. This would work depending on how
>>> many employees you're managing.
>>>
>>> At this moment I'm beginning to feel that maybe using both approaches is
>>> the best way to go. And I think this is one of Cassandra's recommendations:
>>> Write your data in several formats if required to fit your reads. Therefore
>>> I'd use my suggestion for getting a salary by employee ID and I'd also have
>>> Peter's one to run the end of the month query.
>>> Does it make sense?
>>>
>>> Cheers!
>>>
>>> Carlos Alonso | Software Engineer | @calonso
>>> <https://twitter.com/calonso>
>>>
>>> On 7 July 2015 at 09:07, Srinivasa T N <seen...@gmail.com> wrote:
>>>
>>>> Thanks for the inputs.
>>>>
>>>> Now my question is how should the app populate the duplicate data,
>>>> i.e., if I have an employee record (along with his FN, LN,..) for the month
>>>> of Apr and later I am populating the same record for the month of may (with
>>>> salary changed), should my application first read/fetch the corresponding
>>>> data for apr and re-insert with modification for month of may?
>>>>
>>>> Regards,
>>>> Seenu.
>>>>
>>>> On Tue, Jul 7, 2015 at 11:32 AM, Peer, Oded <oded.p...@rsa.com> wrote:
>>>>
>>>>>  The data model suggested isn’t optimal for the “end of month” query
>>>>> you want to run since you are not querying by partition key.
>>>>>
>>>>> The query would look like “select EmpID, FN, LN, basic from salaries
>>>>> where month = 1” which requires filtering and has unpredictable 
>>>>> performance.
>>>>>
>>>>>
>>>>>
>>>>> For this type of query to be fast you can use the “month” column as
>>>>> the partition key and the “EmpID” and the clustering column.
>>>>>
>>>>> This approach also has drawbacks:
>>>>>
>>>>> 1. This data model creates a wide row. Depending on the number of
>>>>> employees this partition might be very large. You should limit partition
>>>>> sizes to 25MB
>>>>>
>>>>> 2. Distributing data according to month means that only a small number
>>>>> of nodes will hold all of the salary data for a specific month which might
>>>>> cause hotspots on those nodes.
>>>>>
>>>>>
>>>>>
>>>>> Choose the approach that works best for you.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *From:* Carlos Alonso [mailto:i...@mrcalonso.com]
>>>>> *Sent:* Monday, July 06, 2015 7:04 PM
>>>>> *To:* user@cassandra.apache.org
>>>>> *Subject:* Re: Example Data Modelling
>>>>>
>>>>>
>>>>>
>>>>> Hi Srinivasa,
>>>>>
>>>>>
>>>>>
>>>>> I think you're right, In Cassandra you should favor denormalisation
>>>>> when in RDBMS you find a relationship like this.
>>>>>
>>>>>
>>>>>
>>>>> I'd suggest a cf like this
>>>>>
>>>>> CREATE TABLE salaries (
>>>>>
>>>>>   EmpID varchar,
>>>>>
>>>>>   FN varchar,
>>>>>
>>>>>   LN varchar,
>>>>>
>>>>>   Phone varchar,
>>>>>
>>>>>   Address varchar,
>>>>>
>>>>>   month integer,
>>>>>
>>>>>   basic integer,
>>>>>
>>>>>   flexible_allowance float,
>>>>>
>>>>>   PRIMARY KEY(EmpID, month)
>>>>>
>>>>> )
>>>>>
>>>>>
>>>>>
>>>>> That way the salaries will be partitioned by EmpID and clustered by
>>>>> month, which I guess is the natural sorting you want.
>>>>>
>>>>>
>>>>>
>>>>> Hope it helps,
>>>>>
>>>>> Cheers!
>>>>>
>>>>>
>>>>>   Carlos Alonso | Software Engineer | @calonso
>>>>> <https://twitter.com/calonso>
>>>>>
>>>>>
>>>>>
>>>>> On 6 July 2015 at 13:01, Srinivasa T N <seen...@gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>    I have basic doubt: I have an RDBMS with the following two tables:
>>>>>
>>>>>    Emp - EmpID, FN, LN, Phone, Address
>>>>>    Sal - Month, Empid, Basic, Flexible Allowance
>>>>>
>>>>>    My use case is to print the Salary slip at the end of each month
>>>>> and the slip contains emp name and his other details.
>>>>>
>>>>>    Now, if I want to have the same in cassandra, I will have a single
>>>>> cf with emp personal details and his salary details.  Is this the right
>>>>> approach?  Should we have the employee personal details duplicated each
>>>>> month?
>>>>>
>>>>> Regards,
>>>>> Seenu.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>

Re: Example Data Modelling

Reply via email to