Re: Cassandra on RocksDB experiment result

2017-04-26 Thread Samba
some features may work with some storage engine but with others; for
example, storing large blobs may be efficient in one storage engine while
quite worse in another. perhaps some storage engines may want to SKIP some
features or add more.

if a storage engine skips a feature, how should the query executor handle
the response or lack of it?
if a storage engine provides a new feature, how should that be enabled for
that particular storage engine alone?

On Wed, Apr 26, 2017 at 5:07 AM, Dikang Gu  wrote:

> I created several tickets to start the discussion, please free feel to
> comment on the JIRAs. I'm also open for suggestions about other efficient
> ways to discuss it.
>
> https://issues.apache.org/jira/browse/CASSANDRA-13474
> https://issues.apache.org/jira/browse/CASSANDRA-13475
> https://issues.apache.org/jira/browse/CASSANDRA-13476
>
> Thanks
> Dikang.
>
> On Mon, Apr 24, 2017 at 9:53 PM, Dikang Gu  wrote:
>
> > Thanks everyone for the feedback and suggestions! They are all very
> > helpful. I'm looking forward to having more discussions about the
> > implementation details.
> >
> > As the next step, we will be focus on three areas:
> > 1. Pluggable storage engine interface.
> > 2. Wide column support on RocksDB.
> > 3. Streaming support on RocksDB.
> >
> > I will go ahead and create some JIRAs, to start the discussion about
> > pluggable storage interface, and how to plug RocksDB into Cassandra.
> >
> > Please let me know your thoughts.
> >
> > Thanks!
> > Dikang.
> >
> > On Mon, Apr 24, 2017 at 12:42 PM, Patrick McFadin 
> > wrote:
> >
> >> Dikang,
> >>
> >> First I want to thank you and everyone else at Instragram for the
> >> engineering talent you have devoted to the Cassandra project. Here's yet
> >> another great example.
> >>
> >> He's going to hate me for dragging him into this, but Vijay
> Parthasarathy
> >> has done some exploratory work before on integrating non-java storage to
> >> Cassandra. Might be helpful person to consult.
> >>
> >> Patrick
> >>
> >>
> >>
> >> On Sun, Apr 23, 2017 at 4:25 PM, Nate McCall 
> wrote:
> >>
> >> > > Please take a look and let me know your thoughts. I think the
> biggest
> >> > > latency win comes from we get rid of most Java garbages created by
> >> > current
> >> > > read/write path and compactions, which reduces the JVM overhead and
> >> makes
> >> > > the latency to be more predictable.
> >> > >
> >> >
> >> > I want to put this here for the record:
> >> > https://issues.apache.org/jira/browse/CASSANDRA-2995
> >> >
> >> > There are some valid points in the above about increased surface area
> >> > and end-user confusion. That said, just under six years is a long
> >> > time. I think we are a more mature project now and I completely agree
> >> > with others about the positive impacts of testability this would
> >> > inherently provide.
> >> >
> >> > +1 from me.
> >> >
> >> > Dikang, thank you for opening this discussion and sharing your efforts
> >> so
> >> > far.
> >> >
> >>
> >
> >
> >
> > --
> > Dikang
> >
> >
>
>
> --
> Dikang
>


Re: restrictions on IN operator

2016-09-13 Thread Samba
I am still getting the following error when trying to run a query with
non-equal conditions in where clause


Caused by: com.datastax.driver.core.exceptions.InvalidQueryException:
Clustering column "author" cannot be restricted (preceding column "timing"
is restricted by a non-EQ relation)


Here is the version details of cassandra:

 show version;
[cqlsh 5.0.1 | Cassandra 3.7 | CQL spec 3.4.2 | Native protocol v4]

when IN conditions can be run on any (clustering) column, then i suppose
non-equal conditions should also be supported. is my expectation wrong?


On Tue, Sep 6, 2016 at 10:08 PM, Benjamin Lerer <benjamin.le...@datastax.com
> wrote:

> Since 2.2, IN restrictions are supported on any partition key or clustering
> colum in SELECT statement. For UPDATE and DELETE statement they are
> supported since 3.0.
>
> Benjamin
>
> On Tue, Sep 6, 2016 at 11:19 AM, Samba <saas...@gmail.com> wrote:
>
> > Hi,
> >
> > I understand, from the documentation, that IN operator is permitted only
> on
> > the last column in partition and/or on the last column in the clustering
> > key.
> >
> > I can understand that IN on partition key column being indeterministic
> but
> > i wonder why is IN permitted only on one (last) clustering column. aren't
> > all the records differing only in clustering columns stay on the same
> node?
> > is it something impossible or is scheduled for future?
> >
> >
> > alternatively, why not distribute the query to all the nodes matching the
> > IN condition, in parallel, and join the result sets or return as futures?
> > perhaps this is what map-reduce does -- but why not a distributed
> database
> > execute its queries (functions & aggregates too) on the matching nodes in
> > its cluster?
> >
> > could you please try explain the rationale behind why it has been done
> so,
> > and if there are any plans to enhancing this behaviour in the near
> future?
> >
> > Thanks & Regards,
> > Samba
> >
>


restrictions on IN operator

2016-09-06 Thread Samba
Hi,

I understand, from the documentation, that IN operator is permitted only on
the last column in partition and/or on the last column in the clustering
key.

I can understand that IN on partition key column being indeterministic but
i wonder why is IN permitted only on one (last) clustering column. aren't
all the records differing only in clustering columns stay on the same node?
is it something impossible or is scheduled for future?


alternatively, why not distribute the query to all the nodes matching the
IN condition, in parallel, and join the result sets or return as futures?
perhaps this is what map-reduce does -- but why not a distributed database
execute its queries (functions & aggregates too) on the matching nodes in
its cluster?

could you please try explain the rationale behind why it has been done so,
and if there are any plans to enhancing this behaviour in the near future?

Thanks & Regards,
Samba