How to find total data size of a keyspace.

2017-02-28 Thread anuja jain
Hi,
Using nodetool cfstats gives me data size of each  table/column family and
nodetool ring gives me load of all keyspace in cluster but I need total
data size of one keyspace in the cluster. How can I get that?


Re: How to find total data size of a keyspace.

2017-03-05 Thread anuja jain
nodetool status  or nodetool ring  still gives the load
of all keyspaces on cluster.

On Tue, Feb 28, 2017 at 6:56 PM, Surbhi Gupta 
wrote:

> Nodetool status key space_name .
> On Tue, Feb 28, 2017 at 4:53 AM anuja jain  wrote:
>
>> Hi,
>> Using nodetool cfstats gives me data size of each  table/column family
>> and nodetool ring gives me load of all keyspace in cluster but I need total
>> data size of one keyspace in the cluster. How can I get that?
>>
>>
>>


Frozen type sin cassandra.

2017-03-05 Thread anuja jain
Hi,
Is there is difference between creating column of type frozen>
and frozen where list_double is UDT of type
frozen> ?
Also how to create a solr index on such columns?


Secondary indices on boolean type columns

2015-10-08 Thread anuja jain
I have two questions,
1. Does creating secondary index on low cardinality columns like of boolean
type helps in read performance any ways? Because there will be only two
values( true and false ) for that column in index table.

2. Should secondary indexes be created on clustering columns even if these
columns are not frequently used in where clause of a query?


Re: Is replication possible with already existing data?

2015-10-09 Thread anuja jain
Hi Ajay,


On Fri, Oct 9, 2015 at 9:00 AM, Ajay Garg  wrote:

> On Thu, Oct 8, 2015 at 9:47 AM, Ajay Garg  wrote:
> > Thanks Eric for the reply.
> >
> >
> > On Thu, Oct 8, 2015 at 1:44 AM, Eric Stevens  wrote:
> >> If you're at 1 node (N=1) and RF=1 now, and you want to go N=3 RF=3, you
> >> ought to be able to increase RF to 3 before bootstrapping your new
> nodes,
> >> with no downtime and no loss of data (even temporary).  Effective RF is
> >> min-bounded by N, so temporarily having RF > N ought to behave as RF =
> N.
> >>
> >> If you're starting at N > RF and you want to increase RF, things get
> >> harrier
> >> if you can't afford temporary consistency issues.
> >>
> >
> > We are ok with temporary consistency issues.
> >
> > Also, I was going through the following articles
> >
> https://10kloc.wordpress.com/2012/12/27/cassandra-chapter-5-data-replication-strategies/
> >
> > and following doubts came up in my mind ::
> >
> >
> > a)
> > Let's say at site-1, Application-Server (APP1) uses the two
> > Cassandra-instances (CAS11 and CAS12), and APP1 generally uses CAS11 for
> all
> > its needs (of course, whatever happens on CAS11, the same is replicated
> to
> > CAS12 at Cassandra-level).
> >
> > Now, if CAS11 goes down, will it be the responsibility of APP1 to
> "detect"
> > this and pick up CAS12 for its needs?
> > Or some automatic Cassandra-magic will happen?
> >
>
In this case, it will be the responsibility of APP1 to start connection to
 CAS12. On the other hand if your APP1 is connecting to cassandra using
Java driver, you can add multiple contact points(CAS11 and CAS12 here) so
that if CAS11 is down it will directly connect to CAS12.

> b)
> > In the same above scenario, let's say before CAS11 goes down, the amount
> of
> > data in both CAS11 and CAS12 was "x".
> >
> > After CAS11 goes down, the data is being put in CAS12 only.
> > After some time, CAS11 comes back up.
> >
> > Now, data in CAS11 is still "x", while data in CAS12 is "y" (obviously,
> "y"
> >> "x").
> >
> > Now, will the additional ("y" - "x") data be automatically
> > put/replicated/whatever back in CAS11 through Cassandra?
> > Or it has to be done manually?
> >
>
> In such a case, CAS12 will store hints for the data to be stored on CAS11
(the tokens of which lies within the range of tokens CAS11 holds)  and
whenever CAS11 is up again, the hints will be transferred to it and the
data will be distributed evenly.


> >
> > If there are easy recommended solutions to above, I am beginning to think
> > that a 2*2 (2 nodes each at 2 data-centres) will be the ideal setup
> > (allowing failures of entire site, or a few nodes on the same site).
> >
> > I am sorry for asking such newbie questions, and I will be grateful if
> these
> > silly questions could be answered by the experts :)
> >
> >
> > Thanks and Regards,
> > Ajay
>
>
>
> --
> Regards,
> Ajay
>


Re: Running sstableloader from every node when migrating?

2015-11-30 Thread anuja jain
Hello George,
You can use sstable2json to create the json of your keyspace and then load
this json to your keyspace in new cluster using json2sstable utility.

On Tue, Dec 1, 2015 at 3:06 AM, Robert Coli  wrote:

> On Thu, Nov 19, 2015 at 7:01 AM, George Sigletos 
> wrote:
>
>> We would like to migrate one keyspace from a 6-node cluster to a 3-node
>> one.
>>
>
> http://www.pythian.com/blog/bulk-loading-options-for-cassandra/
>
> =Rob
>
>


Sorting & pagination in apache cassandra 2.1

2016-01-07 Thread anuja jain
HI All,
 If suppose I have a cassandra table with structure
CREATE TABLE test.t1 (
col1 text,
col2 text,
col3 text,
col4 text,
PRIMARY KEY (col1, col2, col3, col4)
) WITH CLUSTERING ORDER BY (col2 ASC, col3 ASC, col4 ASC);

and it has following data

 col1 | col2 | col3 | col4
--+--+--+--
  abc |  abc |  abc |  abc

and I query the table saying
select * from t1 where col1='abc' order by col3;

it gives me following error
InvalidRequest: code=2200 [Invalid query] message="Order by currently only
support the ordering of columns following their declared order in the
PRIMARY KEY"

While reading on docs I came to know that only the first clustering column
can ordered by independently and for the other columns we need to follow
the sequence of the clustering columns.
My question is, what is the alternative if we need to order by col3 or col4
in my above example without including col2 in order by clause.


Thanks,
Anuja


Re: Sorting & pagination in apache cassandra 2.1

2016-01-11 Thread anuja jain
What is the alternative if my cassandra version is prior to 3.0
(specifically) 2.1) and which is already in production.?

Also as per the docs given at

https://docs.datastax.com/en/datastax_enterprise/4.6/datastax_enterprise/srch/srchCapazty.html
 what does it mean by we need to do capacity planning if we need to search
using SOLR. What is other alternative when we do not know the size of the
data ?

 Thanks,

Anuja



On Fri, Jan 8, 2016 at 12:15 AM, Tyler Hobbs  wrote:

>
> On Thu, Jan 7, 2016 at 6:45 AM, anuja jain  wrote:
>
>> My question is, what is the alternative if we need to order by col3 or
>> col4 in my above example without including col2 in order by clause.
>>
>
> The server-side alternative is to create a second table (or a materialized
> view, if you're using 3.0+) that uses a different clustering order.
> Cassandra purposefully only supports simple and efficient queries that can
> be handled quickly (with a few exceptions), and arbitrary ordering is not
> part of that, especially if you consider complications like paging.
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>


Re: Sorting & pagination in apache cassandra 2.1

2016-01-11 Thread anuja jain
1 more question, what does it mean by "cassandra inherently sorts data"?
For eg:
I have a table with schema

CREATE TABLE users (

...   user_name varchar PRIMARY KEY,

...   password varchar,

...   gender varchar,

...   session_token varchar,

...   state varchar,

...   birth_year bigint

... );

I inserted data in random order but I on firing select statement I get data
sorted by birth_year..  So why does this happen?

 cqlsh:learning> select * from users;



user_name | birth_year | gender | password | session_token | state

---+++--+---+-

  John |   1979 |  M | qwer |   abc |  JK

   Dharini |   1980 |  F |  Xyz |   abc | Gujarat

 Keval |   1990 |  M |  DDD |   abc |  WB

On Tue, Jan 12, 2016 at 12:52 PM, anuja jain  wrote:

> What is the alternative if my cassandra version is prior to 3.0
> (specifically) 2.1) and which is already in production.?
>
> Also as per the docs given at
>
>
> https://docs.datastax.com/en/datastax_enterprise/4.6/datastax_enterprise/srch/srchCapazty.html
>  what does it mean by we need to do capacity planning if we need to
> search using SOLR. What is other alternative when we do not know the size
> of the data ?
>
>  Thanks,
>
> Anuja
>
>
>
> On Fri, Jan 8, 2016 at 12:15 AM, Tyler Hobbs  wrote:
>
>>
>> On Thu, Jan 7, 2016 at 6:45 AM, anuja jain  wrote:
>>
>>> My question is, what is the alternative if we need to order by col3 or
>>> col4 in my above example without including col2 in order by clause.
>>>
>>
>> The server-side alternative is to create a second table (or a
>> materialized view, if you're using 3.0+) that uses a different clustering
>> order.  Cassandra purposefully only supports simple and efficient queries
>> that can be handled quickly (with a few exceptions), and arbitrary ordering
>> is not part of that, especially if you consider complications like paging.
>>
>>
>> --
>> Tyler Hobbs
>> DataStax <http://datastax.com/>
>>
>
>


Re: Sorting & pagination in apache cassandra 2.1

2016-01-12 Thread anuja jain
I understand the meaning of SSTable but whats the reason behind sorting the
table on the basis of int columns first..
Is there any data type preference in cassandra?
Also What is the alternative to creating materialised views if my cassandra
version is prior to 3.0 (specifically 2.1) and which is already in
production.?


On Wed, Jan 13, 2016 at 12:17 AM, Robert Coli  wrote:

> On Mon, Jan 11, 2016 at 11:30 PM, anuja jain  wrote:
>
>> 1 more question, what does it mean by "cassandra inherently sorts data"?
>>
>
> SSTable = Sorted Strings Table.
>
> It doesn't contain "Strings" anymore, really, but that's a hint.. :)
>
> =Rob
>


Re: Sorting & pagination in apache cassandra 2.1

2016-01-14 Thread anuja jain
@Jonathan
what do you mean by "you'll need to maintain your own materialized view
tables"?
does it mean we have to create new table for each query?

On Wed, Jan 13, 2016 at 7:40 PM, Narendra Sharma 
wrote:

> In the example you gave the primary key user _ name is the row key. Since
> the default partition is random you are getting rows in random order.
>
> Since each row no clustering column there is no further grouping of data.
> Or in simple terms each row has one record and is being returned ordered by
> column name.
>
> To see some meaningful ordering there should be some clustering column
> defined.
>
> You can use create additional column families to maintain ordering. Or use
> external solutions like elasticsearch.
> On Jan 12, 2016 10:07 PM, "anuja jain"  wrote:
>
>> I understand the meaning of SSTable but whats the reason behind sorting
>> the table on the basis of int columns first..
>> Is there any data type preference in cassandra?
>> Also What is the alternative to creating materialised views if my
>> cassandra version is prior to 3.0 (specifically 2.1) and which is already
>> in production.?
>>
>>
>> On Wed, Jan 13, 2016 at 12:17 AM, Robert Coli 
>> wrote:
>>
>>> On Mon, Jan 11, 2016 at 11:30 PM, anuja jain 
>>> wrote:
>>>
>>>> 1 more question, what does it mean by "cassandra inherently sorts data"?
>>>>
>>>
>>> SSTable = Sorted Strings Table.
>>>
>>> It doesn't contain "Strings" anymore, really, but that's a hint.. :)
>>>
>>> =Rob
>>>
>>
>>


solr Textsearch in dse 4.8.3

2016-01-19 Thread anuja jain
Hi,
I am using solr of dse 4.8.3 to do text search on cassandra data.
On a String type column when I am use regex email:*gmail* it does not
return me the data that is inserted after starting cassandra in solr mode.
Infact on hitting query everytime it is returning different result.
Schema.xml has following entries for email column
 



  


What settings do I need to do for it?

Thanks,
Anuja


Re: Frozen type sin cassandra.

2017-03-29 Thread anuja jain
Thanks Tyler.

On Tue, Mar 7, 2017 at 10:54 PM, Tyler Hobbs  wrote:

>
> On Sun, Mar 5, 2017 at 11:53 PM, anuja jain  wrote:
>
>> Is there is difference between creating column of type
>> frozen> and frozen where list_double is UDT of
>> type frozen> ?
>>
>
> Yes, there is a difference in serialization format: the first will be
> serialized directly as a list, the second will be serialized as a
> single-field UDT containing a list.
>
> Additionally, the second form supports altering the type by adding fields
> to the UDT.  This can't be done with the first form.  If you don't need
> this capability, I recommend going with the simpler option of
> frozen>.
>
>
>> Also how to create a solr index on such columns?
>>
>
> I have no idea, sorry.
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>


Can we get username and timestamp in cqlsh_history?

2017-03-29 Thread anuja jain
Hi,
I have a cassandra cluster having a lot of keyspaces and users. I want to
get the history of cql commands along with the username and the time at
which the command is run.
Also if we are running some commands from GUI tools like Devcenter,dbeaver,
can we log those commands too? If yes, how?

Thanks,
Anuja


Re: Can we get username and timestamp in cqlsh_history?

2017-04-12 Thread anuja jain
Thanks Nicolas. That is exactly what I was looking for.

On Tue, Apr 4, 2017 at 12:08 AM, Durity, Sean R  wrote:

> Sounds like you want full auditing of CQL in the cluster. I have not seen
> anything built into the open source version for that (but I could be
> missing something). DataStax Enterprise does have an auditing feature.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* anuja jain [mailto:anujaja...@gmail.com]
> *Sent:* Wednesday, March 29, 2017 7:37 AM
> *To:* user@cassandra.apache.org
> *Subject:* Can we get username and timestamp in cqlsh_history?
>
>
>
> Hi,
>
> I have a cassandra cluster having a lot of keyspaces and users. I want to
> get the history of cql commands along with the username and the time at
> which the command is run.
>
> Also if we are running some commands from GUI tools like
> Devcenter,dbeaver, can we log those commands too? If yes, how?
>
>
>
> Thanks,
>
> Anuja
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>


How to change frozen to non frozen columns in cassandra

2017-04-12 Thread anuja jain
Hi ,
I have a table with columns of type frozen>
I want to convert it to simple list
How can I do that without droping existing column? I have data in that
column.
I am using dse 4.8.11

Thanks,
Anuja


Re: [Marketing Mail] Re: nodetool status high load info

2017-04-12 Thread anuja jain
Do you perform a lot of deletes or updates on your database?
On restart, it performs major compaction which can reduce the load on your
node by removing stale data.
Try configuring compaction in you conf to perform minor compaction i.e.
compactions at a regular interval.

Thanks,
Anuja

On Wed, Apr 12, 2017 at 3:02 PM, Osman YOZGATLIOGLU <
osman.yozgatlio...@krontech.com> wrote:

> Hello,
>
> Here is the problem loads, first node shows 206TB data. After cassandra
> restart it shows 51TB, like df shows.
>
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address   Load   Tokens   Owns (effective)  Host ID  Rack
> UN  x.x.x.1  206 TB 256  50.6% xx  rack1
> UN  x.x.x.2  190.77 TB  256  49.9% yy  rack1
> ..
>
> --  Address   Load   Tokens   Owns (effective)  Host ID  Rack
> UN  x.x.x.1  51.01 TB   256  50.6% xx  rack1
> UN  x.x.x.2  49.84 TB   256  49.9% yy  rack1
> ..
>
>
> nodetool tpstats;
> Pool NameActive   Pending  Completed   Blocked
> All time blocked
> MutationStage 2 175536494778 0
>  0
> ViewMutationStage 0 0  0 0
>  0
> ReadStage 0 0  41402 0
>  0
> RequestResponseStage  0 035515109625 0
>  0
> ReadRepairStage   0 0  3 0
>  0
> CounterMutationStage  0 0  0 0
>  0
> MiscStage 0 0  0 0
>  0
> CompactionExecutor5 5 732161 0
>  0
> MemtableReclaimMemory 0 0 198602 0
>  0
> PendingRangeCalculator0 0 11 0
>  0
> GossipStage   0 03854373 0
>  0
> SecondaryIndexManagement  0 0  0 0
>  0
> HintsDispatcher   1 7  6 0
>  0
> MigrationStage0 0  6 0
>  0
> MemtablePostFlush 0 0 200265 0
>  0
> ValidationExecutor0 0  0 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   0 0 198602 0
>  0
> InternalResponseStage 0 05209219 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> Native-Transport-Requests 0 015910719923 0
>  192131887
>
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> HINT 0
> MUTATION   185
> COUNTER_MUTATION 0
> BATCH_STORE  0
> BATCH_REMOVE 0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
>
> sar values;
> 05:10:01CPU %user %nice   %system   %iowait%steal
>  %idle
> 05:20:01all 26.96 16.09  3.73  2.23  0.00
>  50.99
> 05:30:02all 26.99 16.83  3.82  2.86  0.00
>  49.50
> 05:40:01all 27.17 18.19  3.83  0.89  0.00
>  49.91
> 05:50:01all 27.16 18.74  3.80  0.28  0.00
>  50.02
> 06:00:01all 26.30 19.88  3.88  0.29  0.00
>  49.64
> 06:10:01all 28.02 21.11  3.91  0.28  0.00
>  46.68
> 06:20:01all 28.37 19.64  3.98  0.40  0.00
>  47.61
> 06:30:01all 29.56 19.51  4.08  0.45  0.00
>  46.40
> 06:40:01all 29.28 20.56  4.08  0.34  0.00
>  45.74
> 06:50:01all 29.46 19.15  3.99  0.19  0.00
>  47.20
> 07:00:01all 29.45 21.09  4.07  0.26  0.00
>  45.13
> 07:10:01all 29.23 21.59  4.18  0.29  0.00
>  44.71
> 07:20:01all 30.78 21.24  4.09  0.48  0.00
>  43.40
> 07:30:01all 29.06 21.63  4.09  0.27  0.00
>  44.94
> 07:40:01all 28.84 21.85  4.13  1.76  0.00
>  43.41
> 07:50:01all 29.22 21.35  4.14  2.53  0.00
>  42.76
> 08:00:01all 30.10 21.66  4.24  2.39  0.00
>  41.60
> 08:10:01all 28.63