Re: Freeing up disk space on Cassandra 1.1.5 with Size-Tiered compaction.

2012-12-05 Thread Alexandru Sicoe
 Major performance hit on node (very bad for us because
> need to be taking data all the time)."
> >
> > Actually, major compaction *does not* stop minor compactions. What
> happens is that due to the size of the size of the sstable that remains
> after your major compaction, it will never be compacted with the upcoming
> new sstables, and because of that, your read performance will go down until
> you run an other major compaction.
> >
> > "2. Switch to Leveled compaction strategy.
> >   - It is mentioned to help with deletes and disk space usage. Can
> someone confirm?"
> >
> > From what I know, Leveled compaction will not free disk space. It will
> allow you to use a greater percentage of your total disk space (50% max for
> sized tier compaction vs about 80% for leveled compaction)
> >
> > "Our usage pattern is write once, read once (export) and delete once! "
> >
> > In this case, I think that leveled compaction fits your needs.
> >
> > "Can anyone suggest which (if any) is better? Are there better
> solutions?"
> >
> > Are your sstable compressed ? You have 2 types of built-in compression
> and you may use them depending on the model of each of your CF.
> >
> > see:
> http://www.datastax.com/docs/1.1/operations/tuning#configure-compression
> >
> > Alain
> >
> > 2012/11/22 Alexandru Sicoe 
> > We are running a 3 node Cassandra 1.1.5 cluster with a 3TB Raid 0 disk
> per node for the data dir and separate disk for the commitlog, 12 cores, 24
> GB RAM (12GB to Cassandra heap).
> >
>
>


Freeing up disk space on Cassandra 1.1.5 with Size-Tiered compaction.

2012-11-22 Thread Alexandru Sicoe
Hello everyone,

We are running a 3 node Cassandra 1.1.5 cluster with a 3TB Raid 0 disk per
node for the data dir and separate disk for the commitlog, 12 cores, 24 GB
RAM (12GB to Cassandra heap).

We now have 1.1 TB worth of data per node (RF = 2).

Our data input is between 20 to 30 GB per day, depending on operating
conditions of the data sources.

Problem is we have to start deleting data because we will hit the capacity.

>From reading around we see we have 2 options:

1. Start issuing regular major compactions (nodetool compact).
 - This is not recommended:
- Stops minor compactions.
- Major performance hit on node (very bad for us because need
to be taking data all the time).

2. Switch to Leveled compaction strategy.
  - It is mentioned to help with deletes and disk space usage. Can
someone confirm?

Can anyone suggest which (if any) is better? Are there better solutions?

Disclaimer:
- Our usage pattern is write once, read once (export) and delete once!
Basically we are using Cassandra as a data buffer between our collection
points and a long term back-up system (it should provide a time window e.g.
1 month of data before data gets deleted from the cluster).

- Due to financial and space constraints it is very unlikely we can add
more nodes to the cluster.

- We were thinking of relying on the automatic minor compactions to free up
space for us but as the Size-Tiered compaction strategy seems to work, we
will hit the capacity before we manage to free up disk space (this is very
strange because no matter how much disk space you have per node data files
will get larger and larger and you will eventually hit the same problem of
minor compactions not freeing space fast enough - Can someone confirm?)

Cheers,
Alex


what's the most 1.1 stable version?

2012-10-05 Thread Alexandru Sicoe
Hello,
 We are planning to upgrade from version 1.0.7 to the 1.1 branch. Which is
the stable version that people are using? I see the latest release is 1.1.5
but maybe it's not fully wise to use this. Is 1.1.4 the one to use?

Cheers,
Alex


Re: repair never finishing 1.0.7

2012-06-25 Thread Alexandru Sicoe
Hi Andras,

I am not using a VPN. The system has been running successfully in this
configuration for a couple of weeks until I noticed the repair is not
working.

What happens is that I configure the IP Tables of the machine on each
Cassandra node to forward packets that are sent to any of the IPs in the
other DC (on ports 7000, 9160 and 7199)  to be sent to the gateway IP. The
gateway does the NAT sending the packets on the other side to the real
destination IP, having replaced the source IP with the initial sender's IP
(at least in my understanding of it).

What might be the problem given the configuration? How to fix this?

Cheers,
Alex

On Mon, Jun 25, 2012 at 12:47 PM, Andras Szerdahelyi <
andras.szerdahe...@ignitionone.com> wrote:

>
>   The DCs are communicating over a gateway where I do NAT for ports 7000,
> 9160 and 7199.
>
>
>  Ah, that sounds familiar. You don't mention if you are VPN'd or not.
> I'll assume you are not.
>
>  So, your nodes are behind network address translation - is that to say
> they advertise ( broadcast ) their internal or translated/forwarded IP to
> each other? Setting up a Cassandra ring across NAT ( without a VPN ) is
> impossible in my experience. Either the nodes on your local network won't
> be able to communicate with each other, because they broadcast their
> translated ( public ) address which is normally ( router configuration )
> not routable from within the local network, or the nodes broadcast their
> internal IP, in which case the "outside" nodes are helpless in trying to
> connect to a local net. On DC2 nodes/the node you issue the repair on,
> check for any sockets being opened to the internal addresses of the nodes
> in DC1.
>
>
>  regards,
> Andras
>
>
>
>  On 25 Jun 2012, at 11:57, Alexandru Sicoe wrote:
>
> Hello everyone,
>
>  I have a 2 DC (DC1:3 and DC2:6) Cassandra1.0.7 setup. I have about
> 300GB/node in the DC2.
>
>  The DCs are communicating over a gateway where I do NAT for ports 7000,
> 9160 and 7199.
>
>  I did a "nodetool repair" on a node in DC2 without any external load on
> the system.
>
>  It took 5 hrs to finish the Merkle tree calculations (which is fine for
> me) but then in the streaming phase nothing happens (0% seen in "nodetool
> netstats") and stays like that forever. Note: it has to stream to/from
> nodes in DC1!
>
>  I tried another time and still the same.
>
>  Looking around I found this thread
>
> http://www.mail-archive.com/user@cassandra.apache.org/msg22167.html
>  which seems to describe the same problem.
>
> The thread gives 2 suggestions:
> - a full cluster restart allows the first attempted repair to complete
> (haven't tested yet; this is not practical even if it works)
> - issue https://issues.apache.org/jira/browse/CASSANDRA-4223 can be the
> problem
>
> Questions:
> 1) How can I make sure that the JIRA issue above is my real problem? (I
> see no errors or warns in the logs; no other activity)
> 2) What should I do to make the repairs work? (If the JIRA issue is the
> problem, then I see there is a fix for it in Version 1.0.11 which is not
> released yet)
>
> Thanks,
> Alex
>
>
>


repair never finishing 1.0.7

2012-06-25 Thread Alexandru Sicoe
Hello everyone,

 I have a 2 DC (DC1:3 and DC2:6) Cassandra1.0.7 setup. I have about
300GB/node in the DC2.

 The DCs are communicating over a gateway where I do NAT for ports 7000,
9160 and 7199.

 I did a "nodetool repair" on a node in DC2 without any external load on
the system.

 It took 5 hrs to finish the Merkle tree calculations (which is fine for
me) but then in the streaming phase nothing happens (0% seen in "nodetool
netstats") and stays like that forever. Note: it has to stream to/from
nodes in DC1!

 I tried another time and still the same.

 Looking around I found this thread

http://www.mail-archive.com/user@cassandra.apache.org/msg22167.html
 which seems to describe the same problem.

The thread gives 2 suggestions:
- a full cluster restart allows the first attempted repair to complete
(haven't tested yet; this is not practical even if it works)
- issue https://issues.apache.org/jira/browse/CASSANDRA-4223 can be the
problem

Questions:
1) How can I make sure that the JIRA issue above is my real problem? (I see
no errors or warns in the logs; no other activity)
2) What should I do to make the repairs work? (If the JIRA issue is the
problem, then I see there is a fix for it in Version 1.0.11 which is not
released yet)

Thanks,
Alex


Re: composite query performance depends on component ordering

2012-04-03 Thread Alexandru Sicoe
Hi Sylvain and Aaron,

Thanks for the comment Sylvain, what you say makes sense, I have
microsecond precision timestamps and looking at some row printouts I see
everything is happening at a different timestamp which means that it won't
compare the second 100 bytes component.

As for the methodology it's not so thorough. I used Cassandra 0.8.5.

What I did is I had acquired a large data set about 300 hrs worth of data
in Schema 1 (details below) which I found was easily hitting thousands of
rows for some queries, thus giving me very poor performance. I converted
this data to Schema 2 (details below) thus grouping the data together in
the same row and increasing the time bucket for the row (with two versions
"Timestamp:ID" and "ID:Timestamp" for the column names). So I obtained a CF
with 66 rows, 11 rows for 3 different types of data sources which are
dominant in the rates of info they give me (each row is a 24 hr time
bucket).

These are the results I got using the CompositeQueryIterator (with a
modified max of 100.000 cols returned per slice) taken from the Composite
query tutorial at
http://www.datastax.com/dev/blog/introduction-to-composite-columns-part-1(code
is at
https://github.com/zznate/cassandra-tutorial). So basically I used null for
start and end in order to read entire rows at a time. I timed my code. The
actual values are doubles for all 3 types. The size is the file size after
dumping the results to a text file.

Ok, in my previous email I just looked at the rows with the max size which
gave me a 20% difference. In earnest it's less.


   Type1

ID:Timestamp Timestamp:ID

 No. Cols returned Size of file ExecTime (sec) ExecTime (sec) ExecTime Diff
%
 387174 25M 12.59 8.6 31.68
 1005113 66M 31.83 21.84 31.38
 579633 38M 18.07 12.46 31.03
 1217634 81M 33.77 24.65 26.99
 376303 24M 12.32 10.36 15.94
 2493007 169M 68.68 59.93 12.74
 6298275 428M 183.28 147.57 19.48
 2777962 189M 83.16 73.3 11.86
 6138047 416M 170.88 155.83 8.81
 3193450 216M 93.26 82.84 11.18
 2302928 155M 69.91 61.62 11.85




Avg 19.3 %


   Type 2

ID:Timestamp Timestamp:ID
 No Cols returned Size of file ExecTime (sec) ExecTime (sec) ExecTime Diff %
350468 40M 12.92 13.12 -1.59  1303797 148M 43.33 38.98 10.04  697763 79M
26.78 22.05 17.66  825414 94M 33.5 26.69 20.31  55075 6.2M 2.97 2.13 28.15
1873775 213M 72.37 51.12 29.37  3982433 453M 147.04 110.71 24.71  1546491
176M 54.86 42.13 23.21  4117491 468M 143.1 114.62 19.9  1747506 199M 63.23
63.05 0.28  2720160 308M 96.06 82.47 14.14



Avg = 16.9 %

   Type 3

ID:Timestamp Timestamp:ID
 No Cols returned Size of file ExecTime (sec) ExecTime (sec) ExecTime Diff %
192667 7.2M 5.88 6.5 -10.49  210593 7.9M 6.33 5.57 12.06  144677 5.4M 3.78
3.74 1.22  207706 7.7M 6.33 5.74 9.28  235937 8.7M 6.34 6.11 3.64  159985
6.0M 4.23 3.93 7.07  134859 5.5M 3.91 3.38 13.46  70545 2.9M 2.96 2.08 29.84
98487 3.9M 4.04 2.62 35.22  205979 8.2M 7.35 5.67 22.87  166045 6.2M 5.12
3.99 22.1



Avg = 13.3 %

Just to understand why I did the tests.

Data set:
I have ~300.000 data sources. Each data source has several variables it can
output values for. There are ~12 variables / data source. This gives ~4
million independent time series (let's call them streams) that need to go
into Cassandra. The streams give me (timestamp,value) pairs at higly
different rates, depending on the data source it comes from and operating
conditions. This translates into very different row lengths if a unique
time bucket is used across all streams.

The data sources can be further grouped in types (several data sources can
share the same type). There are ~100 types.

Use case:
The system
- will serve a web dashboard.
- should allow queries at highest granularity for short periods of time (up
to between 4-8hrs) on any individual stream or grouping of streams
- should allow a method of obtaining on demand (offline) analytics over
long periods of time (up to 1 year) and then (real-time) querying on the
analytics data

Cassandra schemes used so far:
Schema 1: 1 row for each of the 3 million streams. Each row is a 4hr time
bucket.
Schema 2: 1 row for each of the 100 types. Each row is an 24hr time bucket.

Now I'm planning to use Schema 2 only with an 8hr time bucket to better
reconcile between rows that get very long and ones that don't.

Cheers,
Alex


On Sat, Mar 31, 2012 at 9:35 PM, aaron morton wrote:

> Can you post the details of the queries you are running, including the
> methodology of the tests ?
>
> (Here is the methodology I used to time queries previously
> http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/)
>
> Cheers
>
>
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 31/03/2012, at 1:29 AM, Alexandru Sicoe wrote:
>
> Hi guys,
>  I am consistently seeing a 20% improvement in query retrieval times if I
> use the composite comparator &

2 questions DataStax Enterprise

2012-04-03 Thread Alexandru Sicoe
Hi guys,
 I'm trying out DSE and looking for the best way to arrange the cluster. I
have 9 nodes: 3 behind a gateway taking in writes from my collectors and 6
outside the gateway that are supposed to take replicas from the other 3 and
serve reads and analytics jobs.

1. Is it ok to run the 3 nodes as normal Cassandra nodes and run the other
6 nodes as analytics? Can I serve both real time reads and M/R jobs from
the 6 nodes? How will these affect each other performancewise?

I know that the way the system is supposed to be used is to separate
analytics from real time queries. I've already explored a possible 3DC
setup with Tyler in another message and it indeed works but I'm afraid it
is too complex and would require me to send 2 replicas across the firewall
which it can't handle very well at peak times, affecting other applications.

2. I started the cluster in the setup described in 1 (3 normal, 6
analytics) and as soon as the Analytics nodes start up they start
outputting this message:

INFO [TASK-TRACKER-INIT] 2012-04-03 17:54:59,575 Client.java (line 629)
Retrying connect to server: IP_OF_NORMAL_CASSANDRA_SEED_NODE:8012. Already
tried 10 time(s).


So it seems my analytics nodes are trying to contact the normal Cassandra
seed node on port 8012 which I read is a "Hadoop Job Tracker client port".
It doesn't seem like this is the normal behavior. Why is it getting
confused? In the .yaml of each node I'm using endpoint_snitch:
com.datastax.bdp.snitch.DseSimpleSnitch and putting in the Analytics seed
node before the normal cassandra seed node in the seeds.

Cheers,
Alex


composite query performance depends on component ordering

2012-03-30 Thread Alexandru Sicoe
Sender: adsi...@gmail.com
Subject: composite query performance depends on component ordering
Message-Id: 
Recipient: adam.nicho...@hl.co.uk


__
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
__--- Begin Message ---
Hi guys,
 I am consistently seeing a 20% improvement in query retrieval times if I
use the composite comparator "Timestamp:ID" instead of "ID:Timestamp" where
Timestamp=Long and ID=~100 character strings. I am retrieving all columns
(~1 million) from a single row. Why is this happening?

Cheers,
Alex
--- End Message ---


composite query performance depends on component ordering

2012-03-30 Thread Alexandru Sicoe
Hi guys,
 I am consistently seeing a 20% improvement in query retrieval times if I
use the composite comparator "Timestamp:ID" instead of "ID:Timestamp" where
Timestamp=Long and ID=~100 character strings. I am retrieving all columns
(~1 million) from a single row. Why is this happening?

Cheers,
Alex


Re: another DataStax OpsCenter question

2012-03-30 Thread Alexandru Sicoe
Hi Nick,

I forgot to say I was using 1.2.3 which I think uses different ports. So I
will upgrade to 1.4.1 and open those ports across the firewall although
that's kind of a pain. I already have about 320 config lines for the
Cassandra cluster itself.

So, just to make things clear, is it mandatory to have one OpsCenter
instance per Cassandra cluster? Even if that cluster is split in multiple
Cassandra DCs across separate regions?

Is there a way to have one OpsCenter per Cassandra DC (monitor Cassandra
DCs individually)? That would get rid of many configuration issues!

Cheers,
Alex

On Thu, Mar 29, 2012 at 9:35 PM, Nick Bailey  wrote:

> This setup may be possible although there are a few potential issues.
> Firstly, see:
> http://www.datastax.com/docs/opscenter/configure_opscenter#configuring-firewall-port-access
>
> Basically the agents and OpsCenter communicate on ports 61620 and
> 61621 by default (those can be configured though). The agents will
> contact the the OpsCenter machine on port 61620. You can specify the
> interface the agents will use to connect to this port when
> installing/setting up the agents.
>
> The OpsCenter machine will contact the agents on port 61621. Right now
> the OpsCenter machine will only talk to the nodes using the
> listen_address configured in your cassandra conf. We have a task to
> fix this in the future so that you can configure the interface that
> opscenter will contact each agent on. In the meantime though OpsCenter
> will need to be able to hit the listen_address for each node.
>
> On Thu, Mar 29, 2012 at 12:47 PM, Alexandru Sicoe 
> wrote:
> > Hello,
> >  I am planning on testing OpsCenter to see how it can monitor a multi DC
> > cluster. There are 2 DCs each on a different side of a firewall. I've
> > configured NAT on the firewall to allow the communication between all
> > Cassandra nodes on ports 7000, 7199 and 9160. The cluster works fine.
> > However when I start OpsCenter (obviously on one side of the firewall)
> the
> > OpsCenter CF gives me two schema versions in the cluster and basically
> > messes up everything. Plus, I can only see the nodes on one the same
> side.
> >
> > What are the requirements to let the OpsCenter on one side see the
> Cassandra
> > nodes and the OpsCenter agents on the other, and viceversa?
> >
> > Is it possible to use OpsCenter across a firewall?
> >
> > Cheers,
> > Alex
>


another DataStax OpsCenter question

2012-03-29 Thread Alexandru Sicoe
Hello,
 I am planning on testing OpsCenter to see how it can monitor a multi DC
cluster. There are 2 DCs each on a different side of a firewall. I've
configured NAT on the firewall to allow the communication between all
Cassandra nodes on ports 7000, 7199 and 9160. The cluster works fine.
However when I start OpsCenter (obviously on one side of the firewall) the
OpsCenter CF gives me two schema versions in the cluster and basically
messes up everything. Plus, I can only see the nodes on one the same side.

What are the requirements to let the OpsCenter on one side see the
Cassandra nodes and the OpsCenter agents on the other, and viceversa?

Is it possible to use OpsCenter across a firewall?

Cheers,
Alex


Cassandra multi DC

2012-03-29 Thread Alexandru Sicoe
Hello everyone,
 How are people running multi DC Cassandra across remote locations? Are
VPNs used? Or some dedicated application proxis? What is the norm here?

Any advice is much appreciated,
Alex


Re: single row key continues to grow, should I be concerned?

2012-03-26 Thread Alexandru Sicoe
count data
>
> (I could probably combine the row directory and the column counter into a
> single counter column family, where the column name is the row key and the
> value is the counter.) A naive solution would require reading the directory
> before every read and the counter before every write--caching could
> probably help with that. So this approach would probably lead to a
> reasonable solution, but it's liable to be somewhat complex. Before I go
> much further down this path, I thought I'd run it by this group in case
> someone can point out a more clever solution.
>
> Thanks,
>
> Jim
> On Thu, Mar 22, 2012 at 5:36 PM, Alexandru Sicoe wrote:
>
>> Thanks Aaron, I'll lower the time bucket, see how it goes.
>>
>> Cheers,
>> Alex
>>
>>
>> On Thu, Mar 22, 2012 at 10:07 PM, aaron morton 
>> wrote:
>>
>>> Will adding a few tens of wide rows like this every day cause me
>>> problems on the long term? Should I consider lowering the time bucket?
>>>
>>> IMHO yeah, yup, ya and yes.
>>>
>>>
>>> From experience I am a bit reluctant to create too many rows because I
>>> see that reading across multiple rows seriously affects performance. Of
>>> course I will use map-reduce as well ...will it be significantly affected
>>> by many rows?
>>>
>>> Don't think it would make too much difference.
>>> range slice used by map-reduce will find the first row in the batch and
>>> then step through them.
>>>
>>> Cheers
>>>
>>>
>>>   -
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 22/03/2012, at 11:43 PM, Alexandru Sicoe wrote:
>>>
>>> Hi guys,
>>>
>>> Based on what you are saying there seems to be a tradeoff that
>>> developers have to handle between:
>>>
>>>"keep your rows under a certain size" vs
>>> "keep data that's queried together, on disk together"
>>>
>>> How would you handle this tradeoff in my case:
>>>
>>> I monitor about 40.000 independent timeseries streams of data. The
>>> streams have highly variable rates. Each stream has its own row and I go to
>>> a new row every 28 hrs. With this scheme, I see several tens of rows
>>> reaching sizes in the millions of columns within this time bucket (largest
>>> I saw was 6.4 million). The sizes of these wide rows are around 400MBytes
>>> (considerably > than 60MB)
>>>
>>> Will adding a few tens of wide rows like this every day cause me
>>> problems on the long term? Should I consider lowering the time bucket?
>>>
>>> From experience I am a bit reluctant to create too many rows because I
>>> see that reading across multiple rows seriously affects performance. Of
>>> course I will use map-reduce as well ...will it be significantly affected
>>> by many rows?
>>>
>>> Cheers,
>>> Alex
>>>
>>> On Tue, Mar 20, 2012 at 6:37 PM, aaron morton 
>>> wrote:
>>>
>>>> The reads are only fetching slices of 20 to 100 columns max at a time
>>>> from the row but if the key is planted on one node in the cluster I am
>>>> concerned about that node getting the brunt of traffic.
>>>>
>>>> What RF are you using, how many nodes are in the cluster, what CL do
>>>> you read at ?
>>>>
>>>> If you have lots of nodes that are in different racks the
>>>> NetworkTopologyStrategy will do a better job of distributing read load than
>>>> the SimpleStrategy. The DynamicSnitch can also result distribute load, see
>>>> cassandra yaml for it's configuration.
>>>>
>>>> I thought about breaking the column data into multiple different row
>>>> keys to help distribute throughout the cluster but its so darn handy having
>>>> all the columns in one key!!
>>>>
>>>> If you have a row that will continually grow it is a good idea to
>>>> partition it in some way. Large rows can slow things like compaction and
>>>> repair down. If you have something above 60MB it's starting to slow things
>>>> down. Can you partition by a date range such as month ?
>>>>
>>>> Large rows are also a little slower to query from
>>>> http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/
>>>>
>>

Re: single row key continues to grow, should I be concerned?

2012-03-22 Thread Alexandru Sicoe
Thanks Aaron, I'll lower the time bucket, see how it goes.

Cheers,
Alex


On Thu, Mar 22, 2012 at 10:07 PM, aaron morton wrote:

> Will adding a few tens of wide rows like this every day cause me problems
> on the long term? Should I consider lowering the time bucket?
>
> IMHO yeah, yup, ya and yes.
>
>
> From experience I am a bit reluctant to create too many rows because I see
> that reading across multiple rows seriously affects performance. Of course
> I will use map-reduce as well ...will it be significantly affected by many
> rows?
>
> Don't think it would make too much difference.
> range slice used by map-reduce will find the first row in the batch and
> then step through them.
>
> Cheers
>
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 22/03/2012, at 11:43 PM, Alexandru Sicoe wrote:
>
> Hi guys,
>
> Based on what you are saying there seems to be a tradeoff that developers
> have to handle between:
>
>"keep your rows under a certain size" vs
> "keep data that's queried together, on disk together"
>
> How would you handle this tradeoff in my case:
>
> I monitor about 40.000 independent timeseries streams of data. The streams
> have highly variable rates. Each stream has its own row and I go to a new
> row every 28 hrs. With this scheme, I see several tens of rows reaching
> sizes in the millions of columns within this time bucket (largest I saw was
> 6.4 million). The sizes of these wide rows are around 400MBytes
> (considerably > than 60MB)
>
> Will adding a few tens of wide rows like this every day cause me problems
> on the long term? Should I consider lowering the time bucket?
>
> From experience I am a bit reluctant to create too many rows because I see
> that reading across multiple rows seriously affects performance. Of course
> I will use map-reduce as well ...will it be significantly affected by many
> rows?
>
> Cheers,
> Alex
>
> On Tue, Mar 20, 2012 at 6:37 PM, aaron morton wrote:
>
>> The reads are only fetching slices of 20 to 100 columns max at a time
>> from the row but if the key is planted on one node in the cluster I am
>> concerned about that node getting the brunt of traffic.
>>
>> What RF are you using, how many nodes are in the cluster, what CL do you
>> read at ?
>>
>> If you have lots of nodes that are in different racks the
>> NetworkTopologyStrategy will do a better job of distributing read load than
>> the SimpleStrategy. The DynamicSnitch can also result distribute load, see
>> cassandra yaml for it's configuration.
>>
>> I thought about breaking the column data into multiple different row keys
>> to help distribute throughout the cluster but its so darn handy having all
>> the columns in one key!!
>>
>> If you have a row that will continually grow it is a good idea to
>> partition it in some way. Large rows can slow things like compaction and
>> repair down. If you have something above 60MB it's starting to slow things
>> down. Can you partition by a date range such as month ?
>>
>> Large rows are also a little slower to query from
>> http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/
>>
>> If most reads are only pulling 20 to 100 columns at a time are there two
>> workloads ? Is it possible store just these columns in a separate row ? If
>> you understand how big a row may get may be able to use the row cache to
>> improve performance.
>>
>> Cheers
>>
>>
>>   -
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 20/03/2012, at 2:05 PM, Blake Starkenburg wrote:
>>
>> I have a row key which is now up to 125,000 columns (and anticipated to
>> grow), I know this is a far-cry from the 2-billion columns a single row key
>> can store in Cassandra but my concern is the amount of reads that this
>> specific row key may get compared to other row keys. This particular row
>> key houses column data associated with one of the more popular areas of the
>> site. The reads are only fetching slices of 20 to 100 columns max at a time
>> from the row but if the key is planted on one node in the cluster I am
>> concerned about that node getting the brunt of traffic.
>>
>> I thought about breaking the column data into multiple different row keys
>> to help distribute throughout the cluster but its so darn handy having all
>> the columns in one key!!
>>
>> key_cache is enabled but row cache is disabled on the column family.
>>
>> Should I be concerned going forward? Any particular advice on large wide
>> rows?
>>
>> Thanks!
>>
>>
>>
>
>


Re: single row key continues to grow, should I be concerned?

2012-03-22 Thread Alexandru Sicoe
Hi guys,

Based on what you are saying there seems to be a tradeoff that developers
have to handle between:

   "keep your rows under a certain size" vs
"keep data that's queried together, on disk together"

How would you handle this tradeoff in my case:

I monitor about 40.000 independent timeseries streams of data. The streams
have highly variable rates. Each stream has its own row and I go to a new
row every 28 hrs. With this scheme, I see several tens of rows reaching
sizes in the millions of columns within this time bucket (largest I saw was
6.4 million). The sizes of these wide rows are around 400MBytes
(considerably > than 60MB)

Will adding a few tens of wide rows like this every day cause me problems
on the long term? Should I consider lowering the time bucket?

>From experience I am a bit reluctant to create too many rows because I see
that reading across multiple rows seriously affects performance. Of course
I will use map-reduce as well ...will it be significantly affected by many
rows?

Cheers,
Alex

On Tue, Mar 20, 2012 at 6:37 PM, aaron morton wrote:

> The reads are only fetching slices of 20 to 100 columns max at a time from
> the row but if the key is planted on one node in the cluster I am concerned
> about that node getting the brunt of traffic.
>
> What RF are you using, how many nodes are in the cluster, what CL do you
> read at ?
>
> If you have lots of nodes that are in different racks the
> NetworkTopologyStrategy will do a better job of distributing read load than
> the SimpleStrategy. The DynamicSnitch can also result distribute load, see
> cassandra yaml for it's configuration.
>
> I thought about breaking the column data into multiple different row keys
> to help distribute throughout the cluster but its so darn handy having all
> the columns in one key!!
>
> If you have a row that will continually grow it is a good idea to
> partition it in some way. Large rows can slow things like compaction and
> repair down. If you have something above 60MB it's starting to slow things
> down. Can you partition by a date range such as month ?
>
> Large rows are also a little slower to query from
> http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/
>
> If most reads are only pulling 20 to 100 columns at a time are there two
> workloads ? Is it possible store just these columns in a separate row ? If
> you understand how big a row may get may be able to use the row cache to
> improve performance.
>
> Cheers
>
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 20/03/2012, at 2:05 PM, Blake Starkenburg wrote:
>
> I have a row key which is now up to 125,000 columns (and anticipated to
> grow), I know this is a far-cry from the 2-billion columns a single row key
> can store in Cassandra but my concern is the amount of reads that this
> specific row key may get compared to other row keys. This particular row
> key houses column data associated with one of the more popular areas of the
> site. The reads are only fetching slices of 20 to 100 columns max at a time
> from the row but if the key is planted on one node in the cluster I am
> concerned about that node getting the brunt of traffic.
>
> I thought about breaking the column data into multiple different row keys
> to help distribute throughout the cluster but its so darn handy having all
> the columns in one key!!
>
> key_cache is enabled but row cache is disabled on the column family.
>
> Should I be concerned going forward? Any particular advice on large wide
> rows?
>
> Thanks!
>
>
>


replication in a 3 data center setup

2012-03-19 Thread Alexandru Sicoe
Hi everyone,

If you have 3 data centers (DC1,DC2 and DC3) with 3 nodes each and you have
a keyspace where the strategy options are such that each DC gets 2
replicas. If you only write to the nodes in DC1 what is the path the
replicas take? Assuming you've correctly interleaved the tokens of all the
nodes [(DC1: x,y,z), (DC2:x+1,y+1,z+1), (DC3:x+2,y+2,z+2)]?

More exactly, if you write a record in a node in DC1 will it send one
replica of it to DC2 and then another replica to DC3? Or will the node in
DC2 replicate the record to DC3 in a chain effect?

I understand that each DC handles it's own internal replication (after a
node receives one replica).

I need to understand this because the connection between DC1 and DC2/DC3 is
limited and ideally I would only want to send a replica to DC2 and have DC2
send a replicat to DC3. Is this possible?

Cheers,
Alex


Re: Datastax Enterprise mixed workload cluster configuration

2012-03-16 Thread Alexandru Sicoe
Hi,

Since this thread already contains the system setup, I just want to ask
another question:

If you have 3 data centers (DC1,DC2 and DC3) and you have a keyspace where
the strategy options are such that each DC gets one replica. If you only
write to the nodes in one DC1 what is the path the replicas take assuming
you're correctly interleaved and evenly spaced the tokens of all the nodes?
If you write a record in a node in DC1 will it replicate it to the node in
DC2 and the node in DC2 will replicate it to the node in DC3? Or will the
node in DC1 replicate the record both to DC2 and DC3?

Cheers,
Alex

On Thu, Mar 15, 2012 at 11:26 PM, Alexandru Sicoe  wrote:

> Sorry for that last message, I was confused because I thought I needed to
> use the DseSimpleSnitch but of course I can use the PropertyFileSnitch and
> that allows me to get the configuration with 3 data centers explained.
>
> Cheers,
> Alex
>
>
> On Thu, Mar 15, 2012 at 10:56 AM, Alexandru Sicoe wrote:
>
>> Thanks Tyler,
>>  I see that cassandra.yaml has "endpoint_snitch:
>> com.datastax.bdp.snitch.DseSimpleSnitch". Will this pick up the
>> configuration from the cassandra-topology.properties file as does the
>> PropertyFileSnitch ? Or is there some other way of telling it which nodes
>> are in withc DC?
>>
>> Cheers,
>> Alex
>>
>>
>> On Wed, Mar 14, 2012 at 9:09 PM, Tyler Hobbs  wrote:
>>
>>> Yes, you can do this.
>>>
>>> You will want to have three DCs: DC1 with [1, 2, 3], DC2 with [4, 5, 6],
>>> and DC3 with [7, 8, 9].  For your normal data keyspace, the replication
>>> strategy should be NTS, and the strategy_options should have some replicas
>>> in each of the three DCs.  For example: {DC1: 3, DC2: 3, DC3: 3} if you
>>> need that level of replication in each one (although you probably only want
>>> an RF of 1 for DC3).
>>>
>>> Your clients that are performing writes should only open connections
>>> against the nodes in DC1, and you should write at CL.ONE or
>>> CL.LOCAL_QUORUM.  Likewise for reads, your clients should only connect to
>>> nodes in DC2, and you should read at CL.ONE or CL.LOCAL_QUORUM.
>>>
>>> The nodes in DC3 should run as analytics nodes.  I believe the default
>>> CL for m/r jobs is ONE, which would work.
>>>
>>> As far as tokens go, interleaving all three DCs and evenly spacing the
>>> tokens will work.  For example, the ordering of your nodes might be [1, 4,
>>> 7, 2, 5, 8, 3, 6, 9].
>>>
>>>
>>> On Wed, Mar 14, 2012 at 12:05 PM, Alexandru Sicoe wrote:
>>>
>>>> Hi everyone,
>>>>  I want to test out the Datastax Enterprise software to have a mixed
>>>> workload setup with an analytics and a real time part.
>>>>
>>>>  However I am not sure how to configure it to achieve what I want: I
>>>> will have 3 real machines on one side of a gateway (1,2,3) and 6 VMs on
>>>> another(4,5,6).
>>>>  1,2,3 will each have a normal Cassandra node that just takes data
>>>> directly from my data sources. I want them to replicate the data to the
>>>> other 6 VMs. Now, out of those 6 VMs 4,5,6 will run normal Cassandra nodes
>>>> and 7,8,9 will run Analytics nodes. So I only want to write to the 1,2,3
>>>> and I only want to serve user reads from 4,5,6 and do analytics on 7,8,9.
>>>> Can I achieve this by configuring 1,2,3,4,5,6 as normal nodes and the rest
>>>> as analytics nodes? If I alternate the tokens as it's explained in
>>>> http://www.datastax.com/docs/1.0/datastax_enterprise/init_dse_cluster#init-dseis
>>>>  it analoguous to achieving something like 3 DCs each getting their own
>>>> replica?
>>>>
>>>> Thanks,
>>>> Alex
>>>>
>>>>
>>>
>>>
>>> --
>>> Tyler Hobbs
>>> DataStax <http://datastax.com/>
>>>
>>>
>>
>


Re: Datastax Enterprise mixed workload cluster configuration

2012-03-15 Thread Alexandru Sicoe
Sorry for that last message, I was confused because I thought I needed to
use the DseSimpleSnitch but of course I can use the PropertyFileSnitch and
that allows me to get the configuration with 3 data centers explained.

Cheers,
Alex

On Thu, Mar 15, 2012 at 10:56 AM, Alexandru Sicoe  wrote:

> Thanks Tyler,
>  I see that cassandra.yaml has "endpoint_snitch:
> com.datastax.bdp.snitch.DseSimpleSnitch". Will this pick up the
> configuration from the cassandra-topology.properties file as does the
> PropertyFileSnitch ? Or is there some other way of telling it which nodes
> are in withc DC?
>
> Cheers,
> Alex
>
>
> On Wed, Mar 14, 2012 at 9:09 PM, Tyler Hobbs  wrote:
>
>> Yes, you can do this.
>>
>> You will want to have three DCs: DC1 with [1, 2, 3], DC2 with [4, 5, 6],
>> and DC3 with [7, 8, 9].  For your normal data keyspace, the replication
>> strategy should be NTS, and the strategy_options should have some replicas
>> in each of the three DCs.  For example: {DC1: 3, DC2: 3, DC3: 3} if you
>> need that level of replication in each one (although you probably only want
>> an RF of 1 for DC3).
>>
>> Your clients that are performing writes should only open connections
>> against the nodes in DC1, and you should write at CL.ONE or
>> CL.LOCAL_QUORUM.  Likewise for reads, your clients should only connect to
>> nodes in DC2, and you should read at CL.ONE or CL.LOCAL_QUORUM.
>>
>> The nodes in DC3 should run as analytics nodes.  I believe the default CL
>> for m/r jobs is ONE, which would work.
>>
>> As far as tokens go, interleaving all three DCs and evenly spacing the
>> tokens will work.  For example, the ordering of your nodes might be [1, 4,
>> 7, 2, 5, 8, 3, 6, 9].
>>
>>
>> On Wed, Mar 14, 2012 at 12:05 PM, Alexandru Sicoe wrote:
>>
>>> Hi everyone,
>>>  I want to test out the Datastax Enterprise software to have a mixed
>>> workload setup with an analytics and a real time part.
>>>
>>>  However I am not sure how to configure it to achieve what I want: I
>>> will have 3 real machines on one side of a gateway (1,2,3) and 6 VMs on
>>> another(4,5,6).
>>>  1,2,3 will each have a normal Cassandra node that just takes data
>>> directly from my data sources. I want them to replicate the data to the
>>> other 6 VMs. Now, out of those 6 VMs 4,5,6 will run normal Cassandra nodes
>>> and 7,8,9 will run Analytics nodes. So I only want to write to the 1,2,3
>>> and I only want to serve user reads from 4,5,6 and do analytics on 7,8,9.
>>> Can I achieve this by configuring 1,2,3,4,5,6 as normal nodes and the rest
>>> as analytics nodes? If I alternate the tokens as it's explained in
>>> http://www.datastax.com/docs/1.0/datastax_enterprise/init_dse_cluster#init-dseis
>>>  it analoguous to achieving something like 3 DCs each getting their own
>>> replica?
>>>
>>> Thanks,
>>> Alex
>>>
>>>
>>
>>
>> --
>> Tyler Hobbs
>> DataStax <http://datastax.com/>
>>
>>
>


Re: Datastax Enterprise mixed workload cluster configuration

2012-03-15 Thread Alexandru Sicoe
Thanks Tyler,
 I see that cassandra.yaml has "endpoint_snitch:
com.datastax.bdp.snitch.DseSimpleSnitch". Will this pick up the
configuration from the cassandra-topology.properties file as does the
PropertyFileSnitch ? Or is there some other way of telling it which nodes
are in withc DC?

Cheers,
Alex

On Wed, Mar 14, 2012 at 9:09 PM, Tyler Hobbs  wrote:

> Yes, you can do this.
>
> You will want to have three DCs: DC1 with [1, 2, 3], DC2 with [4, 5, 6],
> and DC3 with [7, 8, 9].  For your normal data keyspace, the replication
> strategy should be NTS, and the strategy_options should have some replicas
> in each of the three DCs.  For example: {DC1: 3, DC2: 3, DC3: 3} if you
> need that level of replication in each one (although you probably only want
> an RF of 1 for DC3).
>
> Your clients that are performing writes should only open connections
> against the nodes in DC1, and you should write at CL.ONE or
> CL.LOCAL_QUORUM.  Likewise for reads, your clients should only connect to
> nodes in DC2, and you should read at CL.ONE or CL.LOCAL_QUORUM.
>
> The nodes in DC3 should run as analytics nodes.  I believe the default CL
> for m/r jobs is ONE, which would work.
>
> As far as tokens go, interleaving all three DCs and evenly spacing the
> tokens will work.  For example, the ordering of your nodes might be [1, 4,
> 7, 2, 5, 8, 3, 6, 9].
>
>
> On Wed, Mar 14, 2012 at 12:05 PM, Alexandru Sicoe wrote:
>
>> Hi everyone,
>>  I want to test out the Datastax Enterprise software to have a mixed
>> workload setup with an analytics and a real time part.
>>
>>  However I am not sure how to configure it to achieve what I want: I will
>> have 3 real machines on one side of a gateway (1,2,3) and 6 VMs on
>> another(4,5,6).
>>  1,2,3 will each have a normal Cassandra node that just takes data
>> directly from my data sources. I want them to replicate the data to the
>> other 6 VMs. Now, out of those 6 VMs 4,5,6 will run normal Cassandra nodes
>> and 7,8,9 will run Analytics nodes. So I only want to write to the 1,2,3
>> and I only want to serve user reads from 4,5,6 and do analytics on 7,8,9.
>> Can I achieve this by configuring 1,2,3,4,5,6 as normal nodes and the rest
>> as analytics nodes? If I alternate the tokens as it's explained in
>> http://www.datastax.com/docs/1.0/datastax_enterprise/init_dse_cluster#init-dseis
>>  it analoguous to achieving something like 3 DCs each getting their own
>> replica?
>>
>> Thanks,
>> Alex
>>
>>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>
>


Datastax Enterprise mixed workload cluster configuration

2012-03-14 Thread Alexandru Sicoe
Hi everyone,
 I want to test out the Datastax Enterprise software to have a mixed
workload setup with an analytics and a real time part.

 However I am not sure how to configure it to achieve what I want: I will
have 3 real machines on one side of a gateway (1,2,3) and 6 VMs on
another(4,5,6).
 1,2,3 will each have a normal Cassandra node that just takes data directly
from my data sources. I want them to replicate the data to the other 6 VMs.
Now, out of those 6 VMs 4,5,6 will run normal Cassandra nodes and 7,8,9
will run Analytics nodes. So I only want to write to the 1,2,3 and I only
want to serve user reads from 4,5,6 and do analytics on 7,8,9.  Can I
achieve this by configuring 1,2,3,4,5,6 as normal nodes and the rest as
analytics nodes? If I alternate the tokens as it's explained in
http://www.datastax.com/docs/1.0/datastax_enterprise/init_dse_cluster#init-dseis
it analoguous to achieving something like 3 DCs each getting their own
replica?

Thanks,
Alex


Re: unidirectional communication/replication

2012-02-29 Thread Alexandru Sicoe
On Sun, Feb 26, 2012 at 8:24 PM, aaron morton wrote:

> All nodes in the cluster need two way communication. Nodes need to talk to
> Gossip to each other so they know they are alive.
>
> If you need to dump a lot of data consider the Hadoop integration.
> http://wiki.apache.org/cassandra/HadoopSupport It can run a bit faster
> than going through the thrift api.
>

Thanks for the suggestion, I will look into it.


> Copying sstables may be another option depending on the data size.
>

The problem with this is that the SSTable, from what I understand, is per
CF, Since I will want to do a semi real time replication of just the latest
data added this won't work because I will be copying over all the data in
the CF.

Cheers,
A


>
> Cheers
>
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 25/02/2012, at 3:21 AM, Alexandru Sicoe wrote:
>
> Hello everyone,
>
> I'm battling with this contraint that I have: I need to regularly ship out
> timeseries data from a Cassandra cluster that sits within an enclosed
> network, outside of the network.
>
> I tried to select all the data within a certian time window, writing to a
> file, and then copying the file out but this hits the I/O performance
> because even for a small time window (say 5mins) I am hitting more than a
> million rows.
>
> It would really help if I used Cassandra to replicate the data
> automatically outside. The problem is they will only allow me to have
> outbound traffic out of the enclosed network (not inbound). Is there any
> way to configure the cluster or have 2 data centers in such a way that the
> data center (node or cluster) outside of the enclosed network only gets a
> replica of the data, without ever needing to communicate anything back?
>
> I appreciate the help,
> Alex
>
>
>


unidirectional communication/replication

2012-02-24 Thread Alexandru Sicoe
Hello everyone,

I'm battling with this contraint that I have: I need to regularly ship out
timeseries data from a Cassandra cluster that sits within an enclosed
network, outside of the network.

I tried to select all the data within a certian time window, writing to a
file, and then copying the file out but this hits the I/O performance
because even for a small time window (say 5mins) I am hitting more than a
million rows.

It would really help if I used Cassandra to replicate the data
automatically outside. The problem is they will only allow me to have
outbound traffic out of the enclosed network (not inbound). Is there any
way to configure the cluster or have 2 data centers in such a way that the
data center (node or cluster) outside of the enclosed network only gets a
replica of the data, without ever needing to communicate anything back?

I appreciate the help,
Alex


Re: Querying all keys in a column family

2012-02-24 Thread Alexandru Sicoe
Hi Aaron and Martin,

Sorry about my previous reply, I thought you wanted to process only all the
row keys in CF.

I have a similar issue as Martin because I see myself being forced to hit
more than a million rows with a query (I only get a few columns from every
row). Aaron, we've talked about this in another thread, basically I am
constrained to ship out a window of data from my online cluster to an
offline cluster. For this I need to read for example 5 min window of all
the data I have. This simply accesses too many rows and I am hitting the
I/O limit on the nodes. As I understand for every row it will do 2 random
disk seeks (I have no caches).

My question is, what can I do to improve the performance of shipping
windows of data entirely out?

Martin, did you use Hadoop as Aaron suggested? How did that work with
Cassandra? I don't understand how accessing 1 million of rows through map
reduce jobs be any faster?

Cheers,
Alexandru


On Tue, Feb 14, 2012 at 10:00 AM, aaron morton wrote:

> If you want to process 1 million rows use Hadoop with Hive or Pig. If you
> use Hadoop you are not doing things in real time.
>
> You may need to rephrase the problem.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 14/02/2012, at 11:00 AM, Martin Arrowsmith wrote:
>
> Hi Experts,
>
> My program is such that it queries all keys on Cassandra. I want to do
> this as quick as possible, in order to get as close to real-time as
> possible.
>
> One solution I heard was to use the sstables2json tool, and read the data
> in as JSON. I understand that reading from each line in Cassandra might
> take longer.
>
> Are there any other ideas for doing this ? Or can you confirm that
> sstables2json is the way to go.
>
> Querying 100 rows in Cassandra the normal way is fast enough. I'd like to
> query a million rows, do some calculations on them, and spit out the result
> like it's real time.
>
> Thanks for any help you can give,
>
> Martin
>
>
>


Re: Querying all keys in a column family

2012-02-14 Thread Alexandru Sicoe
Hey Martin,
 Have you tried CQL query: "SELECT FIRST 0 * FROM cfName" ?
Cheers,
Alex

On Mon, Feb 13, 2012 at 11:00 PM, Martin Arrowsmith <
arrowsmith.mar...@gmail.com> wrote:

> Hi Experts,
>
> My program is such that it queries all keys on Cassandra. I want to do
> this as quick as possible, in order to get as close to real-time as
> possible.
>
> One solution I heard was to use the sstables2json tool, and read the data
> in as JSON. I understand that reading from each line in Cassandra might
> take longer.
>
> Are there any other ideas for doing this ? Or can you confirm that
> sstables2json is the way to go.
>
> Querying 100 rows in Cassandra the normal way is fast enough. I'd like to
> query a million rows, do some calculations on them, and spit out the result
> like it's real time.
>
> Thanks for any help you can give,
>
> Martin
>


Re: How does Cassandra decide when to do a minor compaction?

2012-01-07 Thread Alexandru Sicoe
Hi Maxim,
 Why do you need to know this?

Cheers,
Alex

On Sat, Jan 7, 2012 at 10:03 AM, aaron morton wrote:

>
> http://www.datastax.com/docs/1.0/operations/tuning#tuning-compaction
>
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 7/01/2012, at 3:17 PM, Maxim Potekhin wrote:
>
> The subject says it all -- pointers appreciated.
>
> Thanks
>
> Maxim
>
>
>


Re: emptying my cluster

2012-01-05 Thread Alexandru Sicoe
Hi,

On Wed, Jan 4, 2012 at 9:54 PM, aaron morton wrote:

> Some thoughts on the plan:
>
> * You are monkeying around with things, do not be surprised when
> surprising things happen.
>

I am just trying to explore different solutions for solving my problem.


> * Deliberately unbalancing the cluster may lead to Bad Things happening.
>

I will take your advice on this. I would have liked to have an extra node
to have 2 nodes in each DC.


> * In the design discussed it is perfectly reasonable for data not to be on
> the archive node.
>

You mean when having the 2 DC setup I mentioned and using TTL? In case I
have the 2 DC setup but don't use TTL I don't understand why data wouldn't
be on the archive node?


> * Truncate is a cluster wide operation and all nodes must be online before
> it will start.
>
* Truncate will snapshot before deleting data, you could use this snapshot.
> * TTL for a column is for a column no matter which node it is on.
>

Thanks for clarifying these!


> * IMHO Cassandra data files (sstables or JSON dumps) are not a good format
> for a historical archive, nothing against Cassandra. You need the lowest
> common format.
>

So what data format should I use for historical archiving?


>
> If you have the resources for a second cluster could you put the two
> together and just have one cluster with a very large retention policy? One
> cluster is easier than two.
>

I am constrained to have limited retention on the Cassandra cluster that is
collecting the data . Once I archive the data for long term storage I
cannot bring it back in the same Cassandra cluster that collected it in the
first place because it's in an enclosed network with strict rules. I have
to load it in another cluster outside the enclosed network. It's not that I
have the resources for a second cluster, I am forced to use a second
cluster.


>
> Assuming there is no business case for this, consider either:
>
> * Dumping the historical data into a Hadoop (with or without HDFS) cluster
> with high compression. If needed you could then run Hive / Pig to fill a
> companion Cassandra cluster with data on demand. Or just query using Hadoop.
> * Dumping the historical data to files with high compression and a roll
> your own solution to fill a cluster.
>
> Ok, thanks for these suggestions, I will have to investigate further.


> Also considering talking to Data Stax about DSE.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 5/01/2012, at 1:41 AM, Alexandru Sicoe wrote:
>
>
Cheers,
Alex

> Hi,
>
> On Tue, Jan 3, 2012 at 8:19 PM, aaron morton wrote:
>
>>   Running a time based rolling window of data can be done using the TTL.
>> Backing up the nodes for disaster recover can be done using snapshots.
>> Restoring any point in time will be tricky because to may restore columns
>> where the TTL has expired.
>>
>
> Yeah, that's the thing...if I want to use the system as I explain further
> below, I cannot do backing up of data (for later restoration) if I'm using
> TTLs.
>
>
>>
>> Will I get a single copy of the data in the remote storage or will it be
>> twice the data (data + replica)?
>>
>> You will  RF copies of the data. (By the way, there is no original copy)
>>
>
> Well, if I organize the cluster as I mentioned in the first email, I will
> get one copy of each row at a certain point in time on node2 if I take it
> offline, perform a major compaction and GC, won't I? I don't want to send
> duplicated data to the mass storage!
>
>
>>
>> Can you share a bit more about the use case ? How much data and what sort
>> of read patterns ?
>>
>>
> I have several applications that feed into Cassandra about 2 million
> different variables (each representing a different monitoring
> value/channel). The system receives updates for each of these monitoring
> values at different rates. For each new update, the timestamp and value are
> recorded in a Cassandra name-value pair. The schema of Cassandra is built
> using one CF for data and 4 other CFs for metadata (metadata CFs are static
> - don't grow almost at all once they've been loaded). The data CF uses a
> row for each variable. Each row acts as a 4 hour time bin. I achieve this
> by creating the row key as a concatenation of  the first 6 digits of the
> timestamp at which the data is inserted + the unique ID of the variable.
> After the time bin expires, a new row will be created for the same variable
> ID.
>
> The system can currently sustain the insertion load. Now I'm looking into 
> organizing
> the flow of data out of th

Re: emptying my cluster

2012-01-04 Thread Alexandru Sicoe
Hi,

On Tue, Jan 3, 2012 at 8:19 PM, aaron morton wrote:

>   Running a time based rolling window of data can be done using the TTL.
> Backing up the nodes for disaster recover can be done using snapshots.
> Restoring any point in time will be tricky because to may restore columns
> where the TTL has expired.
>

Yeah, that's the thing...if I want to use the system as I explain further
below, I cannot do backing up of data (for later restoration) if I'm using
TTLs.


>
> Will I get a single copy of the data in the remote storage or will it be
> twice the data (data + replica)?
>
> You will  RF copies of the data. (By the way, there is no original copy)
>

Well, if I organize the cluster as I mentioned in the first email, I will
get one copy of each row at a certain point in time on node2 if I take it
offline, perform a major compaction and GC, won't I? I don't want to send
duplicated data to the mass storage!


>
> Can you share a bit more about the use case ? How much data and what sort
> of read patterns ?
>
>
I have several applications that feed into Cassandra about 2 million
different variables (each representing a different monitoring
value/channel). The system receives updates for each of these monitoring
values at different rates. For each new update, the timestamp and value are
recorded in a Cassandra name-value pair. The schema of Cassandra is built
using one CF for data and 4 other CFs for metadata (metadata CFs are static
- don't grow almost at all once they've been loaded). The data CF uses a
row for each variable. Each row acts as a 4 hour time bin. I achieve this
by creating the row key as a concatenation of  the first 6 digits of the
timestamp at which the data is inserted + the unique ID of the variable.
After the time bin expires, a new row will be created for the same variable
ID.

The system can currently sustain the insertion load. Now I'm looking
into organizing
the flow of data out of the cluster and retrieval performance for random
queries:

Why do I need to organize the data out? Well, my requirement is to keep all
the data coming into the system at the highest granularity for long term
(several years). The 3 node cluster I mentioned is the online cluster which
is supposed to be able to absorb the input load for a relatively short
period of time, a few weeks (I am constrained to do this). After this
period the data has to be shipped out of the cluster in a mass storage
facility and the cluster needs to be emptied to make room for more data.
Also, the online cluster will serve reads while it takes in data. For older
data I am planning to have another cluster that gets loaded with data from
the storage facility on demand and will serve reads from there.

Why random queries? There is no specific use case about them, that's why I
want to rely only on the built in Cassandra indexes for now. Generally the
client will ask for sets of values within a time range up to 8-10 hours in
the past. Apart from some sets of variables that will be almost always
asked together, any combination is possible because this system will feed
in a web dashboard which will be used for debugging purposes  - to
correlate and aggregate streams of variables. Depending on the problem,
different variable combinations could be investigated.


> Can you split the data stream into a permanent log record and also into
> cassandra for a rolling window of query able data ?
>

In the end, essentially that's what I've been meaning to do with organizing
the cluster in a 2 DC setup: i wanted to have 2 nodes in DC1 taking the
data and reads (the rolling window) and replicating to the node in DC2 (the
permanent log - of a single copy of the data). I was thinking of
implementing the rolling window by emptying the nodes in DC1 using truncate
instead of what you propose now with the rolling window using TTL.

Ok, so I can do what you are saying easily if Cassandra allows me to have a
TTL only on the first copy of the data and have the second replica without
a TTL. Is this possible? I think it would solve my problem, as long as I
can backup and empty the node in DC2 before the TTLs expire in the other 2
nodes.

Cheers,
Alex


> Cheers
>
>   -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 3/01/2012, at 11:41 PM, Alexandru Sicoe wrote:
>
> Hi,
>
> I need to build a system that stores data for years, so yes, I am backing
> up data in another mass storage system from where it could be later
> accessed. The data that I successfully back up has to be deleted from my
> cluster to make space for new data coming in.
>
> I was aware about the snapshotting which I will use for getting the data
> out of node2: it creates hard links to the SSTables of a CF and then I can
> copy over those files pointed to by the hard lin

Re: emptying my cluster

2012-01-03 Thread Alexandru Sicoe
Hi,

I need to build a system that stores data for years, so yes, I am backing
up data in another mass storage system from where it could be later
accessed. The data that I successfully back up has to be deleted from my
cluster to make space for new data coming in.

I was aware about the snapshotting which I will use for getting the data
out of node2: it creates hard links to the SSTables of a CF and then I can
copy over those files pointed to by the hard links into another location.
After that I get rid of the snapshot (hard links) and then I can truncate
my CFs. It's clear that snapshotting will give me a single copy of the data
in case I have a unique copy of the data on one node. It's not clear to me
what happens if I have let's say a cluster with 3 nodes and RF=2 and I do a
snapshot of every node and copy those snapshots to remote storage. Will I
get a single copy of the data in the remote storage or will it be twice the
data (data + replica)?

I've started reading about TTL and I think I can use it but it's not clear
to me how it would work in conjunction with the snapshotting/backing up I
need to do. I mean, it will impose a deadline by which I need to perform a
backup in order not to miss any data. Also, I might duplicate the data if
some columns don't expire fully between 2 backups. Any clarifications on
this?

Cheers,
Alex

On Tue, Jan 3, 2012 at 9:44 AM, aaron morton wrote:

> That sounds a little complicated.
>
> Do you want to get the data out for an off node backup or is it for
> processing in another system ?
>
> You may get by using:
>
> * TTL to expire data via compaction
> * snapshots for backups
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 3/01/2012, at 11:00 AM, Alexandru Sicoe wrote:
>
> Hi everyone and Happy New Year!
>
> I need advice for organizing data flow outside of my 3 node Cassandra
> 0.8.6 cluster. I am configuring my keyspace to use the
> NetworkTopologyStrategy. I have 2 data centers each with a replication
> factor 1 (i.e. DC1:1; DC2:1) the configuration of the PropertyFileSnitch is:
>
>
> ip_node1=DC1:RAC1
>
> ip_node2=DC2:RAC1
>
> ip_node3=DC1:RAC1
> I assign tokens like this:
> node1 = 0
> node2 = 1
> node3 = 85070591730234615865843651857942052864
>
> My write consistency level is ANY.
>
> My data sources are only inserting data in node1 & node3. Essentially what
> happens is that a replica of every input value will end up on node2. Node 2
> thus has a copy of the entire data written to the cluster. When Node2
> starts getting full, I want to have a script which pulls it off-line and
> does a sequence of operations (compaction/snapshotting/exporting/truncating
> the CFs) in order to back up the data in a remote place and to free it up
> so that it can take more data. When it comes back on-line it will take
> hints from the other 2 nodes.
>
> This is how I plan on shipping data out of my cluster without any downtime
> or any major performance penalty. The problem is when I want to also
> truncate the CFs in node1 & node3 to also free them up of data. I don't
> know whether I can do this without any downtime or without any serious
> performance penalties. Is anyone using truncate to free up CFs of data? How
> efficient is this?
>
> Any observations or suggestions are much appreciated!
>
> Cheers,
> Alex
>
>
>


emptying my cluster

2012-01-02 Thread Alexandru Sicoe
Hi everyone and Happy New Year!

I need advice for organizing data flow outside of my 3 node Cassandra 0.8.6
cluster. I am configuring my keyspace to use the NetworkTopologyStrategy. I
have 2 data centers each with a replication factor 1 (i.e. DC1:1; DC2:1)
the configuration of the PropertyFileSnitch is:


ip_node1=DC1:RAC1

ip_node2=DC2:RAC1

ip_node3=DC1:RAC1
I assign tokens like this:
node1 = 0
node2 = 1
node3 = 85070591730234615865843651857942052864

My write consistency level is ANY.

My data sources are only inserting data in node1 & node3. Essentially what
happens is that a replica of every input value will end up on node2. Node 2
thus has a copy of the entire data written to the cluster. When Node2
starts getting full, I want to have a script which pulls it off-line and
does a sequence of operations (compaction/snapshotting/exporting/truncating
the CFs) in order to back up the data in a remote place and to free it up
so that it can take more data. When it comes back on-line it will take
hints from the other 2 nodes.

This is how I plan on shipping data out of my cluster without any downtime
or any major performance penalty. The problem is when I want to also
truncate the CFs in node1 & node3 to also free them up of data. I don't
know whether I can do this without any downtime or without any serious
performance penalties. Is anyone using truncate to free up CFs of data? How
efficient is this?

Any observations or suggestions are much appreciated!

Cheers,
Alex


Cassandra cluster HW spec (commit log directory vs data file directory)

2011-10-25 Thread Alexandru Sicoe
Hi everyone,

I am currently in the process of writing a hardware proposal for a Cassandra
cluster for storing a lot of monitoring time series data. My workload is
write intensive and my data set is extremely varied in types of variables
and insertion rate for these variables (I will have to handle an order of 2
million variables coming in, each at very different rates - the majority of
them will come at very low rates but there are many that will come at higher
rates constant rates and a few coming in with huge spikes in rates). These
variables correspond to all basic C++ types and arrays of these types. The
highest insertion rates are received for basic types, out of which U32
variables seem to be the most prevalent (e.g. I recorded 2 million U32 vars
were inserted in 8 mins of operation while 600.000 doubles and 170.000
strings were inserted during the same time. Note this measurement was only
for a subset of the total data currently taken in).

At the moment I am partitioning the data in Cassandra in 75 CFs (each CF
corresponds to a logical partitioning of the set of variables mentioned
before - but this partitioning is not related with the amount of data or
rates...it is somewhat random). These 75 CFs account for ~1 million of the
variables I need to store. I have a 3 node Cassandra 0.8.5 cluster (each
node is a 4 real core with 4 GB RAM and split commit log directory and data
file directory between two RAID arrays with HDDs). I can handle the load in
this configuration but the average CPU usage of the Cassandra nodes is
slightly above 50%. As I will need to add 12 more CFs (corresponding to
another ~ 1 million variables) plus potentially other data later, it is
clear that I need better hardware (also for the retrieval part).

I am looking at Dell servers (Power Edge etc)

Questions:

1. Is anyone using Dell HW for their Cassandra clusters? How do they behave?
Anybody care to share their configurations or tips for buying, what to avoid
etc?

2. Obviously I am going to keep to the advice on the
http://wiki.apache.org/cassandra/CassandraHardware and split the commmitlog
and data on separate disks. I was going to use SSD for commitlog but then
did some more research and found out that it doesn't make sense to use SSDs
for sequential appends because it won't have a performance advantage with
respect to rotational media. So I am going to use rotational disk for the
commit log and an SSD for data. Does this make sense?

3. What's the best way to find out how big my commitlog disk and my data
disk has to be? The Cassandra hardware page says the Commitlog disk
shouldn't be big but still I need to choose a size!

4. I also noticed RAID 0 configuration is recommended for the data file
directory. Can anyone explain why?

Sorry for the huge email.

Cheers,
Alex


Re: CQL select not working for CF defined programatically with Hector API

2011-10-05 Thread Alexandru Sicoe
Perfectly right. Sorry for not paying attention!
Thanks Eric,
Alex

On Tue, Oct 4, 2011 at 4:19 AM, Eric Evans  wrote:

> On Mon, Oct 3, 2011 at 12:02 PM, Alexandru Sicoe 
> wrote:
> > Hi,
> >  I am using Cassandra 0.8.5, Hector 0.8.0-2 and cqlsh (cql 1.0.3). If I
> > define a CF with comparator LongType like this:
> >
> > BasicColumnFamilyDefinition columnFamilyDefinition = new
> > BasicColumnFamilyDefinition();
> > columnFamilyDefinition.setKeyspaceName("XXX");
> > columnFamilyDefinition.setName("YYY");
> > columnFamilyDefinition.setDefaultValidationClass(_BYTESTYPE);
> > columnFamilyDefinition.setMemtableOperationsInMillions(0.1);
> > columnFamilyDefinition.setMemtableThroughputInMb(40);
> >
> columnFamilyDefinition.setComparatorType(ComparatorType.LONGTYPE);
> > try {
> > cluster.addColumnFamily(new
> > ThriftCfDef(columnFamilyDefinition));
> > } catch(HectorException e) {
> > throw e;
> > }
> >
> > Then I put some data in the CF.
> >
> > The I try to do the following queries in cqlsh:
> >
> >   use XXX;
> >   select * from YYY where KEY='aaa';
> >
> > nothing is returned!
> >
> > If I however do:
> >   select * from YYY;
> >
> > all the results are returned propperly!
> >
> > So I have 2 questios:
> > 1) Can I read with CQL if CFs were defined using the basic API? (the fact
> > that select * from YYY; works suggests that this is possible)
> > 2) If yes, what is the correct query to use to read data with CQL? (I
> > suspect KEY is wrong...is there a default?)
>
> I suspect that you did not select a key validation class, and ended up
> with a default of BytesType.  CQL requires that your terms be hex
> encoded when using BytesType.
>
> --
> Eric Evans
> Acunu | http://www.acunu.com | @acunu
>


CQL select not working for CF defined programatically with Hector API

2011-10-03 Thread Alexandru Sicoe
Hi,
 I am using Cassandra 0.8.5, Hector 0.8.0-2 and cqlsh (cql 1.0.3). If I
define a CF with comparator LongType like this:

BasicColumnFamilyDefinition columnFamilyDefinition = new
BasicColumnFamilyDefinition();
columnFamilyDefinition.setKeyspaceName("XXX");
columnFamilyDefinition.setName("YYY");
columnFamilyDefinition.setDefaultValidationClass(_BYTESTYPE);
columnFamilyDefinition.setMemtableOperationsInMillions(0.1);
columnFamilyDefinition.setMemtableThroughputInMb(40);
columnFamilyDefinition.setComparatorType(ComparatorType.LONGTYPE);
try {
cluster.addColumnFamily(new
ThriftCfDef(columnFamilyDefinition));
} catch(HectorException e) {
throw e;
}

Then I put some data in the CF.

The I try to do the following queries in cqlsh:

  use XXX;
  select * from YYY where KEY='aaa';

nothing is returned!

If I however do:
  select * from YYY;

all the results are returned propperly!

So I have 2 questios:
1) Can I read with CQL if CFs were defined using the basic API? (the fact
that select * from YYY; works suggests that this is possible)
2) If yes, what is the correct query to use to read data with CQL? (I
suspect KEY is wrong...is there a default?)

Cheers,
Alex