Re: Manual Repairs

2017-06-21 Thread Ben Slater
The closest you can get to this kind of functionality is by breaking up
your repairs by ranges and then you could pause/restart part way through
the set of ranges. There are some basic scripted approaches around to doing
this but Cassandra Reaper is probably your best bet to get this kind of
functionality in most circumstances.

Cheers
Ben

On Thu, 22 Jun 2017 at 08:27 Mark Furlong  wrote:

> Can a repair be paused, and if paused can it be restarted from the point
> of the pause, or does it start over?
>
>
>
> *Mark Furlong*
>
> Sr. Database Administrator
>
> *mfurl...@ancestry.com *
> M: 801-859-7427 <(801)%20859-7427>
>
> O: 801-705-7115 <(801)%20705-7115>
>
> 1300 W Traverse Pkwy
>
> Lehi, UT 84043
>
>
>
>
>
> ​
> [image: image003.png]
>
>
>
>
>
-- 


*Ben Slater*

*Chief Product Officer *

   


Read our latest technical blog posts here
.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Manual Repairs

2017-06-21 Thread Fay Hou [Storage Service] ­
what version of your cassandra and what kind repair you are doing? full
repair or incremental repair? list your repair command

On Wed, Jun 21, 2017 at 3:27 PM, Mark Furlong  wrote:

> Can a repair be paused, and if paused can it be restarted from the point
> of the pause, or does it start over?
>
>
>
> *Mark Furlong*
>
> Sr. Database Administrator
>
> *mfurl...@ancestry.com *
> M: 801-859-7427
>
> O: 801-705-7115
>
> 1300 W Traverse Pkwy
>
> Lehi, UT 84043
>
>
>
>
>
> ​[image: http://c.mfcreative.com/mars/email/shared-icon/sig-logo.gif]
>
>
>
>
>


Manual Repairs

2017-06-21 Thread Mark Furlong
Can a repair be paused, and if paused can it be restarted from the point of the 
pause, or does it start over?

Mark Furlong

Sr. Database Administrator

mfurl...@ancestry.com
M: 801-859-7427
O: 801-705-7115
1300 W Traverse Pkwy
Lehi, UT 84043





​[http://c.mfcreative.com/mars/email/shared-icon/sig-logo.gif]





Re: Count limit

2017-06-21 Thread Vladimir Yudovin
Hi,



Some body told because the count return 1 row result

He is right



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Wed, 21 Jun 2017 02:43:32 -0400 web master 
socketman2...@gmail.com wrote 




According to 
http://www.maigfrga.ntweb.co/counting-indexing-and-ordering-cassandra

SELECT COUNT(*) FROM product limit 5000;

must return no more than 5000 , but Why it don't works? and count whole number?

Some body told because the count return 1 row result and some body told that it 
is a bug in new version of cassandra?



How can I stop counting and limit it?










Re: Pagination

2017-06-21 Thread Vladimir Yudovin
Hi,

can this https://docs.datastax.com/en/developer/java-driver/2.1/manual/paging/ 
help you?



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Wed, 21 Jun 2017 02:44:17 -0400 web master 
socketman2...@gmail.com wrote 




I am migrating from MySql to Cassandra , In mysql I use OFFSET and LIMIT to 
paginate , the problem is that we have Android client that request next page 
and POST to server OFFSET and LIMIT so I don't know how can I migrate to 
Cassandra and keep backward compatibility 

Is there any technique for the problem?









RE: COUNT

2017-06-21 Thread ZAIDI, ASAD A
Is it possible for you to share tracing info for the query? You can enable 
tracing at cqlsh prompt with command

Cqlsh > TRACING ON
Cqlsh> run your query
Tracing session info should be printed on screen

Tracing will enable us to know where most of the time is spent!

From: web master [mailto:socketman2...@gmail.com]
Sent: Wednesday, June 21, 2017 1:44 AM
To: user@cassandra.apache.org
Subject: COUNT

I have this schema

CREATE TABLE IF NOT EXISTS "inbox" (
  "groupId"   BIGINT,
  "createTime"   TIMEUUID,
  "mailId"   BIGINT,
  "body" TEXT,
  PRIMARY KEY ("groupId","createTime","mailId")
)WITH CLUSTERING ORDER BY ("createTime" DESC);

This table is frequency updated (250K per second) and each between 10-1000 new 
record is inserted in each "groupId" per day

The problem is I want to count `Unread mails` that based on a TIMEUUID compare, 
that means I want to count

SELECT count(1) FROM inbox WHERE "groupId"=123456 AND "createTime"> 
specificTimeUUID

But this query is inefficiend and slow sometimes

If we have <1000 unread message there is no problem but when we have 50K+ 
unread message we have huge issue

What is the best solution for the problem?


Re: LIKE

2017-06-21 Thread @Nandan@
If you are sure , that you want to do LIKE , then you can go with SASI .

https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/useSASIIndex.html
Hope this will help you .

On Wed, Jun 21, 2017 at 2:44 PM, web master  wrote:

> I have this table
>
> CREATE TABLE users_by_username (
> username text PRIMARY KEY,
> email text,
> age int
> )
>
> I want to run query like the following
>
> select username from users where username LIKE 'shl%' LIMIT 10;
>
>
> Always , I want to find only 10 username (Case insensitive) that start
> with specific characters , How can I do it efficient? I want to read
> minimum partitions and best performance
>
>


Pagination

2017-06-21 Thread web master
I am migrating from MySql to Cassandra , In mysql I use OFFSET and LIMIT to
paginate , the problem is that we have Android client that request next
page and POST to server OFFSET and LIMIT so I don't know how can I migrate
to Cassandra and keep backward compatibility

Is there any technique for the problem?


Ordering by last inserted row

2017-06-21 Thread web master
I have this schema

CREATE TABLE IF NOT EXISTS "blog" (
blog_id INT,
post_id INT,
body TEXT,
PRIMARY KEY (blog_id,post_id)
)WITH CLUSTERING ORDER BY (post_id DESC);

I want to get sorted list of blog_id by post_id , that means If I have
blog_id IN (1,2,3,4,5,6,7,8,9,10) sort them by post_id


I think the best way is a materialized view , but I don't know how to
implement it?


LIKE

2017-06-21 Thread web master
I have this table

CREATE TABLE users_by_username (
username text PRIMARY KEY,
email text,
age int
)

I want to run query like the following

select username from users where username LIKE 'shl%' LIMIT 10;


Always , I want to find only 10 username (Case insensitive) that start with
specific characters , How can I do it efficient? I want to read minimum
partitions and best performance


COUNT

2017-06-21 Thread web master
I have this schema


CREATE TABLE IF NOT EXISTS "inbox" (
  "groupId"   BIGINT,
  "createTime"   TIMEUUID,
  "mailId"   BIGINT,
  "body" TEXT,
  PRIMARY KEY ("groupId","createTime","mailId")
)WITH CLUSTERING ORDER BY ("createTime" DESC);


This table is frequency updated (250K per second) and each between 10-1000
new record is inserted in each "groupId" per day

The problem is I want to count `Unread mails` that based on a TIMEUUID
compare, that means I want to count

SELECT count(1) FROM inbox WHERE "groupId"=123456 AND "createTime">
specificTimeUUID

But this query is inefficiend and slow sometimes


If we have <1000 unread message there is no problem but when we have 50K+
unread message we have huge issue


What is the best solution for the problem?


Count limit

2017-06-21 Thread web master
According to http://www.maigfrga.ntweb.co/counting-indexing-and-
ordering-cassandra

SELECT COUNT(*) FROM product limit 5000;

must return no more than 5000 , but Why it don't works? and count whole
number?

Some body told because the count return 1 row result and some body told
that it is a bug in new version of cassandra?


How can I stop counting and limit it?


Question about materialized view

2017-06-21 Thread web master
Assume this schema

CREATE TABLE t(
a int,
b int,
c int,
d int,
e text,
f date,
g int,
PRIMARY KEY (a,b)
)


I we create following mv

CREATE MATERIALIZED VIEW t_mv as
select a,b,c,d from t where c is not null and d is not null
 PRIMARY KEY (c,d,a,b);


What happens if we run this query

UPDATE t SET g=1 WHERE a=10 AND b = 20


As you can see "g" is excluded in "t_mv" , I want to know what cassandra
doing internaly?

Is there any overhead for t_mv , or cassandra smartly detect there is no
changes for t_mv and no-operation


for example If we have 10 materialized view like above-mentioned , is
Update that excluded in mv impact performance? or the performance in equal
to when there is no mv


tombstone limit reaches 100K cells

2017-06-21 Thread web master
Accoding to http://docs.datastax.com/en/cql/3.1/cql/ddl/ddl_when_use_
index_c.html#concept_ds_sgh_yzz_zj__upDatIndx

> Cassandra stores tombstones in the index until the tombstone limit
reaches 100K cells. After exceeding the tombstone limit, the query that
uses the indexed value will fail.


1- Is this rule is same for updating a column that is cluster column for a
materialized view?

2- Is is bad idea to set a frequency updated column as cluster column for a
materialized view? If yes what is the alternative solution? If no , why no?