Using Cassandra as an object store

2019-04-18 Thread Gene
Howdy

I'm looking at the possibility of using cassandra as an object store to
offload image/blob data from an Oracle database.  I've seen mentions of it
being used as an object store in a large scale fashion, like with Walmart:

https://medium.com/walmartlabs/building-object-store-storing-images-in-cassandra-walmart-scale-a6b9c02af593

However I have found little on small scale setups and if it's even worth
using Cassandra in place of something else that's meant to be used for
object storage, like Ceph.

Additionally, I've read that cassandra struggles with storing objects 10MB
or larger and it's recommended to break objects up into smaller chunks,
which either requires some kind of middleware between our application and
cassandra, or it would require our application to split objects into
smaller chunks and recombine them as needed.

I've looked into pithos and astyanax, but those are both no longer
developed and I'm not seeing anything that might replace them in the long
term.

https://github.com/exoscale/pithos
https://github.com/Netflix/astyanax

Any helpful information or advice would be greatly appreciated.

Thanks in advance.

-Gene


Re: unsubscribe

2015-10-29 Thread Gene
You have been unsuccessfully unsubscribed from the mailing list.

We are sad to see you go.  While waiting for you to return we will be
watching this on repeat.

https://www.youtube.com/watch?v=eh7lp9umG2I

On Thu, Oct 29, 2015 at 7:55 PM, M.Tarkeshwar Rao 
wrote:

>


Re: Data visualization tools for Cassandra

2015-10-20 Thread Gene
Have you looked at OpsCenter?

On Tue, Oct 20, 2015 at 9:59 AM, Vikram Kone  wrote:

> Hi,
> We are looking for data visualization tools to chart some graphs over data
> present in our cassandra cluster. Are there any open source visualization
> tools that people are using to quickly draw some charts over data in their
> cassandra tables? We are using Datastax version of cassandra, in case that
> is relevant.
>
>
> thanks
>


Re: unsubscribe

2015-10-14 Thread Gene
unsubscribe
afmelden
abbestellen
ignorer cette


On Wed, Oct 14, 2015 at 8:24 AM, rock zhang  wrote:

> unsubscribe
>
> On Oct 14, 2015, at 7:28 AM, Alain RODRIGUEZ  wrote:
>
> From http://cassandra.apache.org/#lists --> just email
> user-unsubscr...@cassandra.apache.org
>
>
> Alain
>
> 2015-10-14 14:29 GMT+02:00 Amila Paranawithana :
>
>> unsubscribe
>>
>> On Wed, Oct 14, 2015 at 5:57 PM, Numan Fatih YARCI > > wrote:
>>
>>> unsubscribe
>>>
>>>
>>
>>
>> --
>>
>> *Amila Iroshani Paranawithana , **Senior Software Engineer* *,
>> AdroitLogic *
>> | ☎: +94779747398
>> | ✍: http://amilaparanawithana.blogspot.com
>> [image: Facebook]  [image:
>> Twitter]  [image: LinkedIn]
>>  [image:
>> Skype] amila.paranawithana
>> ​
>>
>
>
>


Re: Why can't nodetool status include a hostname?

2015-10-08 Thread Gene
Yeah, -r or --resolve-ip is what you're looking for.

Cassandra's nodetool command is kind of wonky.  Inconsistent across
functions (e.g. sometimes 'keyspace.columnfamily' other times 'keyspace
columnfamily', pay attention to the character between the items), doesn't
resolve IPs by default (while standard linux commands require you to pass
something like -n to not resolve names), so on and so forth.

When in doubt run nodetool without specifying a command and it'll list all
of the available options (another example of wonkiness, the 'help' argument
is not listed in this output)

-Gene

On Thu, Oct 8, 2015 at 7:01 AM, Paulo Motta 
wrote:

> Have you tried using the -r or --resolve-ip option?
>
> 2015-10-07 19:59 GMT-07:00 Kevin Burton :
>
>> I find it really frustrating that nodetool status doesn't include a
>> hostname
>>
>> Makes it harder to track down problems.
>>
>> I realize it PRIMARILY uses the IP but perhaps cassandra.yml can include
>> an optional 'hostname' parameter that can be set by the user.  OR have the
>> box itself include the hostname in gossip when it starts up.
>>
>> I realize that hostname wouldn't be authoritative and that the IP must
>> still be shown but we could add another column for the hostname.
>>
>> --
>>
>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>> Engineers!
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>>
>>
>


Re: Cassandra Configuration VS Static IPs.

2015-10-06 Thread Gene
Rather than using public IPs on the Cassandra nodes at all I would set up a
VPN.

1. Make sure all Cassandra nodes are on the same private network with
static IPs
2. Set up a new micro instance with a static public IP, make sure it has a
static IP on the private network as well
3. Install OpenVPN on the new micro instance, configure it to tunnel
traffic to that private network
4. Configure your cassandra nodes to permit access from this instance
5. Configure your workstation to make a VPN connection and use the static,
private IPs of the cassandra nodes for your client connectivity

-Gene


On Sun, Oct 4, 2015 at 5:31 PM, Renato Perini 
wrote:

> Jonathan, I have some difficulties in understanding what you're talking
> about. A client is a program connecting to a cassandra instance. All it
> needs to know is an IP, a keyspace and a table to operate. My client is
> nothing more than a simple textual version of a program like datastax
> devcenter. No "same dc" concepts are involved for using it.
> As for AWS, I'm not changing anything. The instances, as I said multiple
> times, don't have an elastic ip, so the public IP is dynamic. This means it
> changes automatically at every reboot.
>
>
> Il 05/10/2015 02:22, Jonathan Haddad ha scritto:
>
> If your client is in the same DC, then you shouldn't use *public* ip
> addresses.  If you're using a recent version of Cassandra you can just set
> the listen_interface and rpc_interface to whatever network interface you've
> got.
>
> If you're really changing IPs when you reboot machines (I have no idea why
> you'd do this, AWS definitely doesn't work this way) then I think you're
> going to hit a whole set of other issues.
>
>
> On Sun, Oct 4, 2015 at 7:10 PM Renato Perini < 
> renato.per...@gmail.com> wrote:
>
>> Yes, the client uses the same datacenter (us-west-2).
>> Maybe I haven't explained well the situation. I'm not asking to connect
>> to nodes *without* using a static IP address, but allowing Cassandra to
>> determine the current public address at the time of connection.
>> Spark, for example, uses shell scripts for configuration, so the public
>> IP (in AWS) can be assigned using the command `curl
>> http://169.254.169.254/latest/meta-data/public-ipv4`, whatever it is at
>> the time of boot.
>> Cassandra uses a yaml file for the main configuration, so this is
>> impossibile to achieve. Basically I would like to make the client connect
>> correctly on all nodes using their public IPs without being required to
>> know them (the client would discover them dynamically while connecting).
>>
>>
>>
>> Il 05/10/2015 00:55, Jonathan Haddad ha scritto:
>>
>> So you're not running the client in the same DC as your Cassandra
>> cluster.  In that case you'll need to be able to connect to the public
>> address of all the nodes.  Technically you could have a whitelist and only
>> connect to 1, I wouldn't recommend it.
>>
>> This is no different than any other database in that you would need a
>> public address to be able to connect to the servers from a machine not in
>> your datacenter.  How else would you connect to them if you don't provide
>> access?
>>
>> On Sun, Oct 4, 2015 at 6:35 PM Renato Perini 
>> wrote:
>>
>>> Seems to be not the case when connecting to my (single) data center
>>> using the java connector with a small client I have developed for testing.
>>> For the broadcast_rpc_address I have configured the local IP of the
>>> nodes. The cluster works fine and nodes communicates fairly well using
>>> their local IPs. When I connect to a node (let's say node 1) from the
>>> outside using the java driver and the node's public IP, the cluster
>>> discovery uses internal IPs for contacting other nodes, leading to
>>> (obviously) errors.
>>>
>>> As for AWS, Elastic IPs are free as long as they're associated to an
>>> instance and the machines are up 24h/7. I have to shut down the machines
>>> during the night for various reasons, so unfortunately they're not totally
>>> free for my use case.
>>>
>>>
>>>
>>> Il 05/10/2015 00:04, Jonathan Haddad ha scritto:
>>>
>>> Public IP?  No, not required unless you're running multiple DCs.
>>>
>>> Where are you running a DC that IPs aren't cheap?  If you're in AWS
>>> they're basically free (or at least the cheapest section of your bill by
>>> far)
>>>
>>>
>>>
>>> On Sun, Oct 4, 2015 at 5:59 PM Renato Perini 
>>> wrote:
>>>
>>>> Is cassandra really supposed to have a static public IP for each and
>>>> single node in the cluster?
>>>> This seems to be expensive (static IPs are nor free neither cheap),
>>>> still the broadcast_rpc_address expects a static IP for client
>>>> communications (load balancing, contact points, etc.)
>>>> Is there some mechanism to determine a public IP at runtime?
>>>>
>>>> Basically, I have nodes (machines) with dynamic public IPs and I cannot
>>>> embed them in the cassandra.yaml file because of their dynamic nature
>>>> (they change at each reboot).
>>>> Any solution to this?
>>>>
>>>> Thanks.
>>>>
>>>
>>>
>>
>


Re: Subscribe again?

2015-09-11 Thread Gene
Check your spam folder.

On Fri, Sep 11, 2015 at 11:55 AM, Ahamed, Aadil  wrote:

> I suddenly stopped receiving mails from this mailing list. Do I need to
> subscribe again?
>
> Thanks,
> Aadil
>


What is your backup strategy for Cassandra?

2015-09-06 Thread Gene
Hello everyone,

I'm new to this mailing list, and still fairly new to Cassandra.  I'm a
systems administrator and have had a 3-node Cassandra cluster with a
replication factor of 3 running in Production for about a year now.  We
have about 200 GB of data per node currently.

Up until recently I have just been performing snapshots and clearing them
out as needed.  I recently implemented an automated process to perform
snapshots of our data and copy them off of our cluster via rsync+ssh.
Pretty soon I'll also be utilising the incremental backup feature for
sstables (cassandra.yaml:incremental_backups), and will be taking a look at
archiving for commitlog as well (commitlog_archiving.properties).

I've seen quite a few blog posts here and there about various back up
strategies.  I'm wondering if anyone on this list would be willing to share
theirs.

Things I'm curious about:

1. Data size
2. Frequency for full snapshots
3. Frequency for copying snapshots off of the Cassandra nodes
4. Do you use the incremental backups feature
5. Do you use commitlog archiving
6. What method you use to copy data off of the cluster (e.g. NFS, rsync,
rsync+ssh, etc)
7. Do you compress your backups, if so how soon (e.g. compress backups
older than N days)
8. Do you use any Off the Shelf scripts for your backups (e.g. tablesnap,
cassandra_snapshotter, etc)
9. Do you utilise AWS for your backups, or do you keep it local (or offsite
on your own hardware)
10. Anything else you'd like to add, especially if I missed something
important

I'm not asking for the best, perfect method for Cassandra backups. I'd just
like to see what others are doing and hopefully use some ideas to improve
our processes.

Thanks in advance for any responses, and sorry for the wall of text.

-Gene


RE: Repair taking long time

2014-09-26 Thread Gene Robichaux
Using their community edition..no support (yet!) :(

Gene Robichaux
Manager, Database Operations
Match.com
8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
Phone: 214-576-3273

-Original Message-
From: jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] On Behalf Of 
Jonathan Haddad
Sent: Friday, September 26, 2014 12:58 PM
To: user@cassandra.apache.org
Subject: Re: Repair taking long time

If you're using DSE you might want to contact Datastax support, rather than the 
ML.

On Fri, Sep 26, 2014 at 10:52 AM, Gene Robichaux  
wrote:
> I am on DSE 4.0.3 which is 2.0.7.
>
>
>
> If 4.5.1 is NOT 2.1. I guess an upgrade will not buy me much…..
>
>
>
> The bad thing is that table is not our largest….. :(
>
>
>
>
>
> Gene Robichaux
>
> Manager, Database Operations
>
> Match.com
>
> 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
>
> Phone: 214-576-3273
>
>
>
> From: Brice Dutheil [mailto:brice.duth...@gmail.com]
> Sent: Friday, September 26, 2014 12:47 PM
> To: user@cassandra.apache.org
> Subject: Re: Repair taking long time
>
>
>
> Unfortunately DSE 4.5.0 is still on 2.0.x
>
>
> -- Brice
>
>
>
> On Fri, Sep 26, 2014 at 7:40 PM, Jonathan Haddad  wrote:
>
> Are you using Cassandra 2.0 & vnodes?  If so, repair takes forever.
> This problem is addressed in 2.1.
>
>
> On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux 
>  wrote:
>> I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC 
>> and 4 in another.
>>
>>
>>
>> Running a repair on a large column family seems to be moving much 
>> slower than I expect.
>>
>>
>>
>> Looking at nodetool compaction stats it indicates the Validation 
>> phase is running that the total bytes is 4.5T (4505336278756).
>>
>>
>>
>> This is a very large CF. The process has been running for 2.5 hours 
>> and has processed 71G (71950433062). That rate is about 28.4 GB per 
>> hour. At this rate it will take 158 hours, just shy of 1 week.
>>
>>
>>
>> Is this reasonable? This is my first large repair and I am wondering 
>> if this is normal for a CF of this size. Seems like a long time to 
>> me.
>>
>>
>>
>> Is it possible to tune this process to speed it up? Is there 
>> something in my configuration that could be causing this slow 
>> performance? I am running HDDs, not SSDs in a JBOD configuration.
>>
>>
>>
>>
>>
>>
>>
>> Gene Robichaux
>>
>> Manager, Database Operations
>>
>> Match.com
>>
>> 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
>>
>> Phone: 214-576-3273
>>
>>
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>
>



--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


RE: Repair taking long time

2014-09-26 Thread Gene Robichaux
I am on DSE 4.0.3 which is 2.0.7.

If 4.5.1 is NOT 2.1. I guess an upgrade will not buy me much…..

The bad thing is that table is not our largest….. :(


Gene Robichaux
Manager, Database Operations
Match.com
8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
Phone: 214-576-3273

From: Brice Dutheil [mailto:brice.duth...@gmail.com]
Sent: Friday, September 26, 2014 12:47 PM
To: user@cassandra.apache.org
Subject: Re: Repair taking long time

Unfortunately DSE 4.5.0 is still on 2.0.x

-- Brice

On Fri, Sep 26, 2014 at 7:40 PM, Jonathan Haddad 
mailto:j...@jonhaddad.com>> wrote:
Are you using Cassandra 2.0 & vnodes?  If so, repair takes forever.
This problem is addressed in 2.1.

On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux
mailto:gene.robich...@match.com>> wrote:
> I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and 4 in
> another.
>
>
>
> Running a repair on a large column family seems to be moving much slower
> than I expect.
>
>
>
> Looking at nodetool compaction stats it indicates the Validation phase is
> running that the total bytes is 4.5T (4505336278756).
>
>
>
> This is a very large CF. The process has been running for 2.5 hours and has
> processed 71G (71950433062). That rate is about 28.4 GB per hour. At this
> rate it will take 158 hours, just shy of 1 week.
>
>
>
> Is this reasonable? This is my first large repair and I am wondering if this
> is normal for a CF of this size. Seems like a long time to me.
>
>
>
> Is it possible to tune this process to speed it up? Is there something in my
> configuration that could be causing this slow performance? I am running
> HDDs, not SSDs in a JBOD configuration.
>
>
>
>
>
>
>
> Gene Robichaux
>
> Manager, Database Operations
>
> Match.com
>
> 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
>
> Phone: 214-576-3273
>
>


--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade



Repair taking long time

2014-09-26 Thread Gene Robichaux
I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and 4 in 
another.

Running a repair on a large column family seems to be moving much slower than I 
expect.

Looking at nodetool compaction stats it indicates the Validation phase is 
running that the total bytes is 4.5T (4505336278756).

This is a very large CF. The process has been running for 2.5 hours and has 
processed 71G (71950433062). That rate is about 28.4 GB per hour. At this rate 
it will take 158 hours, just shy of 1 week.

Is this reasonable? This is my first large repair and I am wondering if this is 
normal for a CF of this size. Seems like a long time to me.

Is it possible to tune this process to speed it up? Is there something in my 
configuration that could be causing this slow performance? I am running HDDs, 
not SSDs in a JBOD configuration.



Gene Robichaux
Manager, Database Operations
Match.com
8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
Phone: 214-576-3273



Node Joining, Not Streaming

2014-09-24 Thread Gene Robichaux
I just added two nodes, one in DC-A and one in DC-B.

The node in DC-A started and immediately started to stream files from its 
piers. The node in DC-B has been in the JOINING state for nearly 24 hours and I 
have not seen any streams started.

This cluster is in-house spanning 2 DCs. Using DataStax 4.0.3 (Cassandra 
2.0.7). There are no errors reported, it just sits at JOINING. Here is the 
GossipInfo for the cluster.

Any ideas?

/10.32.1.158
  NET_VERSION:7
  RACK:RAC1
  X_11_PADDING:{"workload":"Cassandra"}
  RPC_ADDRESS:0.0.0.0
  SCHEMA:90809305-a44e-3351-862d-f80dc7903ae1
  LOAD:2.8806352661E10
  RELEASE_VERSION:2.0.7.31
  SEVERITY:0.0
  HOST_ID:c1b4b919-e26e-4d34-b629-b531f9878fce
  DC:DPT
  STATUS:BOOT,-6892935543732432942
/10.32.1.157
  NET_VERSION:7
  RACK:RAC1
  X_11_PADDING:{"workload":"Cassandra","active":"true"}
  LOAD:2.1943389849E12
  SCHEMA:90809305-a44e-3351-862d-f80dc7903ae1
  RPC_ADDRESS:0.0.0.0
  RELEASE_VERSION:2.0.7.31
  SEVERITY:0.12536564469337463
  DC:DPT
  STATUS:NORMAL,4990283037218436151
  HOST_ID:472131ef-80ca-46bb-a9bd-6a04da3ceb30
/10.32.1.156
  NET_VERSION:7
  RACK:RAC1
  X_11_PADDING:{"workload":"Cassandra","active":"true"}
  LOAD:2.280831696114E12
  SCHEMA:90809305-a44e-3351-862d-f80dc7903ae1
  RPC_ADDRESS:0.0.0.0
  RELEASE_VERSION:2.0.7.31
  SEVERITY:0.12552301585674286
  DC:DPT
  STATUS:NORMAL,378597018791048251
  HOST_ID:2b1fc7a6-316a-4235-bfa9-2ff0f4f4de33
/10.32.1.155
  NET_VERSION:7
  RACK:RAC1
  X_11_PADDING:{"workload":"Cassandra","active":"true"}
  LOAD:2.141134386354E12
  SCHEMA:90809305-a44e-3351-862d-f80dc7903ae1
  RPC_ADDRESS:0.0.0.0
  RELEASE_VERSION:2.0.7.31
  SEVERITY:0.08361203968524933
  DC:DPT
  STATUS:NORMAL,-8844775018063727558
  HOST_ID:33ca33d6-384d-45d7-8ce9-08be7292f487
/10.1.1.152
  NET_VERSION:7
  RACK:RAC1
  X_11_PADDING:{"workload":"Cassandra","active":"true"}
  LOAD:1.396792762048E12
  SCHEMA:90809305-a44e-3351-862d-f80dc7903ae1
  RPC_ADDRESS:0.0.0.0
  RELEASE_VERSION:2.0.7.31
  SEVERITY:0.0
  DC:CLV
  STATUS:NORMAL,6140487838204514978
  HOST_ID:69868030-4c2a-4261-9586-1cc0c4423959
/10.1.1.153
  NET_VERSION:7
  RACK:RAC1
  X_11_PADDING:{"workload":"Cassandra"}
  LOAD:7.66529213407E11
  SCHEMA:90809305-a44e-3351-862d-f80dc7903ae1
  RPC_ADDRESS:0.0.0.0
  RELEASE_VERSION:2.0.7.31
  SEVERITY:0.0
  DC:CLV
  STATUS:BOOT,-5381045770254015948
  HOST_ID:90fd4296-90a2-44d3-8f5d-eda3479feddb
/10.1.1.150
  NET_VERSION:7
  RACK:RAC1
  X_11_PADDING:{"workload":"Cassandra","active":"true"}
  LOAD:1.776464445614E12
  SCHEMA:90809305-a44e-3351-862d-f80dc7903ae1
  RPC_ADDRESS:0.0.0.0
  RELEASE_VERSION:2.0.7.31
  SEVERITY:0.12552301585674286
  DC:CLV
  STATUS:NORMAL,1528801819777127078
  HOST_ID:e2a074fa-59c7-449f-b87d-d17f303f8774
/10.1.1.151
  NET_VERSION:7
  RACK:RAC1
  X_11_PADDING:{"workload":"Cassandra","active":"true"}
  LOAD:2.153245271245E12
  SCHEMA:90809305-a44e-3351-862d-f80dc7903ae1
  RPC_ADDRESS:0.0.0.0
  RELEASE_VERSION:2.0.7.31
  SEVERITY:0.0
  DC:CLV
  STATUS:NORMAL,-3082884198650260828
  HOST_ID:459aa667-39a5-46e7-9374-62f90108cb65
/10.1.1.154
  NET_VERSION:7
  RACK:RAC1
  X_11_PADDING:{"workload":"Cassandra","active":"true"}
  LOAD:1.304807703954E12
  SCHEMA:90809305-a44e-3351-862d-f80dc7903ae1
  RPC_ADDRESS:0.0.0.0
  RELEASE_VERSION:2.0.7.31
  SEVERITY:0.0
  DC:CLV
  STATUS:NORMAL,-5609228167613478907
  HOST_ID:f48c9bde-501f-496d-8f0f-fa92a03ebe7e

Gene Robichaux
Manager, Database Operations
8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
Phone: 214-576-3273



Moving a server from one DC to another

2014-09-04 Thread Gene Robichaux
I have a situation where I need to move a server from one DC to another DC. I 
am using the ProperFileSnitch and my cassandra-topology.properties looks like 
this.

Server150=CLV:RAC1
Server151=CLV:RAC1
Server152=CLV:RAC1
Server153=DPT:RAC1
Server154=DPT:RAC1
Server155=DPT:RAC1
Server156=DPT:RAC1
Server157=DPT:RAC1
Server158=DPT:RAC1

I need to move Server153 and Server 154 from the DPT DC to the CLV DC. The 
servers are not going to physically move and the names/IPs will remain the same.

Server150=CLV:RAC1
Server151=CLV:RAC1
Server152=CLV:RAC1
Server153=CLV:RAC1
Server154=CLV:RAC1
Server155=DPT:RAC1
Server156=DPT:RAC1
Server157=DPT:RAC1
Server158=DPT:RAC1

What is the best way to do this?

Do I just need to change the cassandra-topology.properties and restart the 
nodes? Do I need to rebalance after?
Do I need to decommission the servers one at a time, then re-bootstrap them 
using the correct DC?

Any help would be appreciated.

Gene Robichaux
Manager, Database Operations
8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
Phone: 214-576-3273