Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Jason Brown
Hi all,

I'd like to deescalate a bit here.

Since this is an Apache and an OSS project, contributions come in many
forms: code, speaking/advocacy, documentation, support, project management,
and so on. None of these things come for free.

Ken, I appreciate you bring up these usability topics; they are certainly
valid concerns. You've mentioned you are working on posting of some sort
that I think will amount to an enumerated list of the topics/issues you
feel need addressing. Some may be simple changes, some may be more
invasive, some we can consider implementing, some not. I look forward to a
positive discussion.

I think what would be best would be for you to complete that list and work
with the community, in a *positive and constructive manner*, towards
getting it done. That is certainly contributing, and contributing in a big
way: project management. Working with the community is going to be the most
beneficial path for everyone.

Ken, if you feel like you'd like some help getting such an initiative
going, and contributing substantively to it (not necessarily in terms of
code) please feel free to reach out to me directly (jasedbr...@gmail.com).

Hoping this leads somewhere positive, that benefits everyone,

-Jason



On Wed, Feb 21, 2018 at 2:53 PM, Kenneth Brotman <
kenbrot...@yahoo.com.invalid> wrote:

> Hi Akash,
>
> I get the part about outside work which is why in replying to Jeff Jirsa I
> was suggesting the big companies could justify taking it on easy enough and
> you know actually pay the people who would be working at it so those people
> could have a life.
>
> The part I don't get is the aversion to usability.  Isn't that what you
> think about when you are coding?  "Am I making this thing I'm building easy
> to use?"  If you were programming for me, we would be constantly talking
> about what we are building and how we can make things easier for users.  If
> I had to fight with a developer, architect or engineer about usability all
> the time, they would be gone and quick.  How do approach programming if you
> aren't trying to make things easy.
>
> Kenneth Brotman
>
> -Original Message-
> From: Akash Gangil [mailto:akashg1...@gmail.com]
> Sent: Wednesday, February 21, 2018 2:24 PM
> To: d...@cassandra.apache.org
> Cc: user@cassandra.apache.org
> Subject: Re: Cassandra Needs to Grow Up by Version Five!
>
> I would second Jon in the arguments he made. Contributing outside work is
> draining and really requires a lot of commitment. If someone requires
> features around usability etc, just pay for it, period.
>
> On Wed, Feb 21, 2018 at 2:20 PM, Kenneth Brotman <
> kenbrot...@yahoo.com.invalid> wrote:
>
> > Jon,
> >
> > Very sorry that you don't see the value of the time I'm taking for this.
> > I don't have demands; I do have a stern warning and I'm right Jon.
> > Please be very careful not to mischaracterized my words Jon.
> >
> > You suggest I put things in JIRA's, then seem to suggest that I'd be
> > lucky if anyone looked at it and did anything. That's what I figured too.
> >
> > I don't appreciate the hostility.  You will understand more fully in
> > the next post where I'm coming from.  Try to keep the conversation
> civilized.
> > I'm trying or at least so you understand I think what I'm doing is
> > saving your gig and mine.  I really like a lot of people is this group.
> >
> > I've come to a preliminary assessment on things.  Soon the cloud will
> > clear or I'll be gone.  Don't worry.  I'm a very peaceful person and
> > like you I am driven by real important projects that I feel compelled
> > to work on for the good of others.  I don't have time for people to
> > hand hold a database and I can't get stuck with my projects on the wrong
> stuff.
> >
> > Kenneth Brotman
> >
> >
> > -Original Message-
> > From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon
> > Haddad
> > Sent: Wednesday, February 21, 2018 12:44 PM
> > To: user@cassandra.apache.org
> > Cc: d...@cassandra.apache.org
> > Subject: Re: Cassandra Needs to Grow Up by Version Five!
> >
> > Ken,
> >
> > Maybe it’s not clear how open source projects work, so let me try to
> > explain.  There’s a bunch of us who either get paid by someone or
> > volunteer on our free time.  The folks that get paid, (yay!) usually
> > take direction on what the priorities are, and work on projects that
> > directly affect our jobs.  That means that someone needs to care
> > enough about the features you want to work on them, if you’re not going
> to do it yourself.
> >
> > Now as others have said already, please put your list of demands in
> > JIRA, if someone is interested, they will work on it.  You may need to
> > contribute a little more than you’ve done already, be prepared to get
> > involved if you actually want to to see something get done.  Perhaps
> > learning a little more about Cassandra’s internals and the people
> > involved will reveal some of the design decisions and priorities of the
> project.
> >

Re: Hints replay incompatible between 2.x and 3.x

2017-08-30 Thread Jason Brown
Hi Andrew,

This question is best for the user@ list, included here.

Thanks,

-Jason

On Wed, Aug 30, 2017 at 10:00 AM, Andrew Whang 
wrote:

> In evaluating 3.x, we found that hints are unable to be replayed between
> 2.x and 3.x nodes. This introduces a risk during the upgrade path for some
> of our write-heavy clusters - nodes will accumulate upwards of 1TB of hints
> if a node goes/remains down for <1hr.
>
> Any suggestions to mitigate this issue?
>


Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread Jason Brown
removing dev@ from this conversation, as the thread is more appropriately
for user@

On Mon, Jun 12, 2017 at 4:51 AM, Eduardo Alonso 
wrote:

> -Virtual tokens are not recommended when using SOLR or
> cassandra-lucene-index.
>
> If you use your table schema you will not have any problem with partition
> size because your table is *not* a WIDE row table (it does not have
> clustering keys)
> The limit for 1 record with those 15 or 20 columns must not be larger that
> 100MB. You will have enough.
>
> Eduardo Alonso
> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
> *
>
> 2017-06-12 12:36 GMT+02:00 @Nandan@ :
>
> > And due to single table videos, maybe it will go with around 15,20
> > columns, then we need to also think very carefully about partition sizes
> > also.
> >
> > On Mon, Jun 12, 2017 at 6:33 PM, @Nandan@  com>
> > wrote:
> >
> >> Yes this is only Option I am also thinking like this as my second
> >> options. Before this I was thinking to do denormalize table based on
> search
> >> columns, but due to partial search this will be not that effective.
> >>
> >> Now suppose , if we are going with this single table as videos. and
> >> implemented with Solr/Lucene, then need to also care about num_tokens ?
> >>
> >>
> >> On Mon, Jun 12, 2017 at 6:27 PM, Eduardo Alonso <
> >> eduardoalo...@stratio.com> wrote:
> >>
> >>> Using cassandra collections
> >>>
> >>> CREATE TABLE videos (
> >>> videoid uuid primary key,
> >>> title text,
> >>> actor list,
> >>> producer list,
> >>> release_date timestamp,
> >>> description text,
> >>> music text,
> >>> etc...
> >>> );
> >>>
> >>> When using collection you need to take care of its length. Collections
> >>> are designed to store
> >>>  collections_c.html>only
> >>> a small amount of data
> >>>  collections_c.html>
> >>> .
> >>> 5/10 actors per movie is ok.
> >>>
> >>>
> >>> Eduardo Alonso
> >>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> >>> 28224 Pozuelo de Alarcón, Madrid
> >>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com //
> *@stratiobd
> >>> *
> >>>
> >>> 2017-06-12 11:54 GMT+02:00 @Nandan@ :
> >>>
>  So In short we have to go with one single table as videos and put
>  primary key as videoid uuid.
>  But then how can we able to handle multiple actor name and producer
>  name. ?
> 
>  On Mon, Jun 12, 2017 at 5:51 PM, Eduardo Alonso <
>  eduardoalo...@stratio.com> wrote:
> 
> > Yes, you are right.
> >
> > Table denormalization is useful just when you have unique primary
> > keys, not your case.
> > Denormalized tables are only different in its primary key, every
> > denormalized table contains all the data (it just change how it is
> > structured). So, if you need to index it, do it with just one table
> (the
> > one you showed us with videoid as the primary key is ok).
> >
> > Solr, Elastic and cassandra-lucene-index are both based on Lucene and
> > all of them fulfill all your needs.
> >
> > Solr (in DSE) and cassandra-lucene-index
> >  are very well
> > integrated with cassandra using its secondary index interface. If you
> > choose elastic search you will need to code the integration (write
> mutex,
> > both cluster synchronization (imagine something written in cassandra
> but
> > failed to write in elastic))
> >
> > I know i am not the most suitable to recommend you to use our product
> > cassandra-lucene-index
> >  but it is open
> > source, just take a look.
> >
> > Eduardo Alonso
> > Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> > 28224 Pozuelo de Alarcón, Madrid
> > Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com
> // *@stratiobd
> > *
> >
> > 2017-06-12 11:18 GMT+02:00 @Nandan@  >:
> >
> >> Hi Eduardo,
> >>
> >> And As we are trying to build an advanced search functionality in
> >> which we can able to do partial search based on actor, producer,
> director,
> >> etc. columns.
> >> So if we do denormalization of tables then we have to create tables
> >> such as below :-
> >> video_by_actor
> >> video_by_producer
> >> video_by_director
> >> video_by_date
> >> etc..
> >> By using denormalized, Cassandra only allows us to do equality
> >> search, but for implementing Partial search we need to implement
> solr on
> >> all above tables.
> 

Re: Long running compaction on huge hint table.

2017-05-16 Thread Jason Brown
Varun,

This a message better for the user@ ML.

Thanks,

-Jason

On Tue, May 16, 2017 at 3:41 AM, varun saluja  wrote:

> Hi Experts,
>
> We are facing issue on production cluster. Compaction on system.hint table
> is running from last 2 days.
>
>
> pending tasks: 1
>compaction type   keyspace   table completed  total
>   unit   progress
>   Compaction system   hints   20623021829   877874092407
>  bytes  2.35%
> Active compaction remaining time :   0h27m15s
>
>
> Active compaction remaining time shows in minutes.  But, this is job is
> running like indefinitely.
>
> We have 3 node cluster V 2.1.7. And we ran  write intensive job last week
> on particular table.
> Compaction on this table finished but hint table size is growing
> continuously.
>
> Can someone Please help me.
>
>
> Thanks & Regards,
> Varun Saluja
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


RE: How to store large columns?

2013-01-21 Thread Jason Brown
The reason for multiple keys (and, by extension, multiple columns) is to better 
distribute the write/read load across the cluster as keys will (hopefully) be 
distributed on different nodes. This helps to avoid hot spots.

Hope this helps,

-Jason Brown
Netflix

From: Sávio Teles [savio.te...@lupa.inf.ufg.br]
Sent: Monday, January 21, 2013 9:51 AM
To: user@cassandra.apache.org
Subject: Re: How to store large columns?

Astyanax split large objects into multiple keys. Is it a good idea? It is 
better to split into multiple columns?

Thanks

2013/1/21 Sávio Teles 
savio.te...@lupa.inf.ufg.brmailto:savio.te...@lupa.inf.ufg.br

Thanks Keith Wright.


2013/1/21 Keith Wright kwri...@nanigans.commailto:kwri...@nanigans.com
This may be helpful:  
https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store

From: Vegard Berget p...@fantasista.nomailto:p...@fantasista.no
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org, Vegard Berget 
p...@fantasista.nomailto:p...@fantasista.no
Date: Monday, January 21, 2013 8:35 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: How to store large columns?



Hi,

You could split it into multiple columns on the client side:
RowKeyData: Part1: [1mb], Part2: [1mb], Part3: [1mb]...PartN[1mb]

Now you can use multiple get() in parallell to get the files back and then join 
them back to one file.

I _think_ maybe the new CQL3-protocol does not have the same limitation, but I 
have never tried large columns there, so someone with more experience than me 
will have to confirm this.

.vegard,

- Original Message -
From:
user@cassandra.apache.orgmailto:user@cassandra.apache.org

To:
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Cc:

Sent:
Mon, 21 Jan 2013 11:16:40 -0200
Subject:
How to store large columns?


We wish to store a column in a row with size larger than 
thrift_framed_transport_size_in_mb. But, Thrift has a maximum frame size 
configured by thrift_framed_transport_size_in_mb in cassandra.yaml.
so, How to store columns with size larger than 
thrift_framed_transport_size_in_mb? Increasing this value does not solve the 
problem, since we have columns with varying sizes.

--
Atenciosamente,
Sávio S. Teles de Oliveira
voice: +55 62 9136 6996tel:%2B55%2062%209136%206996
http://br.linkedin.com/in/savioteles
Mestrando em Ciências da Computação - UFG
Arquiteto de Software
Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG



--
Atenciosamente,
Sávio S. Teles de Oliveira
voice: +55 62 9136 6996tel:%2B55%2062%209136%206996
http://br.linkedin.com/in/savioteles
Mestrando em Ciências da Computação - UFG
Arquiteto de Software
Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG



--
Atenciosamente,
Sávio S. Teles de Oliveira
voice: +55 62 9136 6996
http://br.linkedin.com/in/savioteles
Mestrando em Ciências da Computação - UFG
Arquiteto de Software
Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG