Cassandra DSC installation fail due to some python dependecies. How to rectify ?

2014-02-17 Thread Ertio Lew
I am trying to install cassandra dsc20 but the installation fails due to
some python dependecies. How could I make this work ?


root@server1:~# sudo apt-get install dsc20
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
  cassandra libjna-java libopts25 ntp python python-minimal
python-support python2.7
  python2.7-minimal
Suggested packages:
  libjna-java-doc ntp-doc apparmor python-doc python-tk python2.7-doc
binutils binfmt-support
Recommended packages:
  perl
The following NEW packages will be installed:
  cassandra dsc20 libjna-java libopts25 ntp python python-minimal
python-support python2.7
  python2.7-minimal
0 upgraded, 10 newly installed, 0 to remove and 0 not upgraded.
Need to get 17.1 MB of archives.
After this operation, 23.2 MB of additional disk space will be used.
Do you want to continue [Y/n]? y
Get:1 http://debian.datastax.com/community/ stable/main cassandra all
2.0.5 [14.3 MB]
Get:2 http://us.archive.ubuntu.com/ubuntu/ raring/main libopts25 amd64
1:5.17.1-1ubuntu2 [62.2 kB]
Get:3 http://us.archive.ubuntu.com/ubuntu/ raring/main ntp amd64
1:4.2.6.p5+dfsg-2ubuntu1 [614 kB]
Get:4 http://us.archive.ubuntu.com/ubuntu/ raring/universe libjna-java
amd64 3.2.7-4 [416 kB]
Get:5 http://us.archive.ubuntu.com/ubuntu/ raring-security/main
python2.7-minimal amd64 2.7.4-2ubuntu3.2 [1223 kB]
Get:6 http://debian.datastax.com/community/ stable/main dsc20 all
2.0.5-1 [1302 B]
Get:7 http://us.archive.ubuntu.com/ubuntu/ raring-security/main
python2.7 amd64 2.7.4-2ubuntu3.2 [263 kB]
Get:8 http://us.archive.ubuntu.com/ubuntu/ raring/main python-minimal
amd64 2.7.4-0ubuntu1 [30.8 kB]
Get:9 http://us.archive.ubuntu.com/ubuntu/ raring/main python amd64
2.7.4-0ubuntu1 [169 kB]
Get:10 http://us.archive.ubuntu.com/ubuntu/ raring/universe
python-support all 1.0.15 [26.7 kB]
Fetched 17.1 MB in 3s (4842 kB/s)
Selecting previously unselected package libopts25.
(Reading database ... 27688 files and directories currently installed.)
Unpacking libopts25 (from .../libopts25_1%3a5.17.1-1ubuntu2_amd64.deb) ...
Selecting previously unselected package ntp.
Unpacking ntp (from .../ntp_1%3a4.2.6.p5+dfsg-2ubuntu1_amd64.deb) ...
Selecting previously unselected package libjna-java.
Unpacking libjna-java (from .../libjna-java_3.2.7-4_amd64.deb) ...
Selecting previously unselected package python2.7-minimal.
Unpacking python2.7-minimal (from
.../python2.7-minimal_2.7.4-2ubuntu3.2_amd64.deb) ...
Selecting previously unselected package python2.7.
Unpacking python2.7 (from .../python2.7_2.7.4-2ubuntu3.2_amd64.deb) ...
Selecting previously unselected package python-minimal.
Unpacking python-minimal (from .../python-minimal_2.7.4-0ubuntu1_amd64.deb) ...
Selecting previously unselected package python.
Unpacking python (from .../python_2.7.4-0ubuntu1_amd64.deb) ...
Selecting previously unselected package python-support.
Unpacking python-support (from .../python-support_1.0.15_all.deb) ...
Selecting previously unselected package cassandra.
Unpacking cassandra (from .../cassandra_2.0.5_all.deb) ...
Selecting previously unselected package dsc20.
Unpacking dsc20 (from .../archives/dsc20_2.0.5-1_all.deb) ...
Processing triggers for man-db ...
Processing triggers for desktop-file-utils ...
Setting up libopts25 (1:5.17.1-1ubuntu2) ...
Setting up ntp (1:4.2.6.p5+dfsg-2ubuntu1) ...
 * Starting NTP server ntpd
 [ OK ]
Setting up libjna-java (3.2.7-4) ...
Setting up python2.7-minimal (2.7.4-2ubuntu3.2) ...
# Empty sitecustomize.py to avoid a dangling symlink
Traceback (most recent call last):
  File /usr/lib/python2.7/py_compile.py, line 170, in module
sys.exit(main())
  File /usr/lib/python2.7/py_compile.py, line 162, in main
compile(filename, doraise=True)
  File /usr/lib/python2.7/py_compile.py, line 106, in compile
with open(file, 'U') as f:
IOError: [Errno 2] No such file or directory:
'/usr/lib/python2.7/sitecustomize.py'
dpkg: error processing python2.7-minimal (--configure):
 subprocess installed post-installation script returned error exit status 1
dpkg: dependency problems prevent configuration of python2.7:
 python2.7 depends on python2.7-minimal (= 2.7.4-2ubuntu3.2); however:
  Package python2.7-minimal is not configured yet.

dpkg: error processing python2.7 (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of python-minimal:
 python-minimal depends on python2.7-minimal (= 2.7.4-1~); however:
  Package python2.7-minimal is not configured yet.

dpkg: error processing python-minimal (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of python:
 python depends on python2.7 (= 2.7.4-1~); however:
  Package python2.7 is not configured yet.
 python depends on python-minimal (= 2.7.4-0ubuntu1); however:
  Package python-minimal is not configured yet.

dpkg: error processing python (--configure):
 dependency problems 

Re: How do I upgrade a single cassandra node in production to 3 nodes cluster ?

2014-02-16 Thread Ertio Lew
I just mean increasing the cluster size not upgrading the cassandra version


On Mon, Feb 17, 2014 at 2:29 AM, spa...@gmail.com wrote:

 By upgrade do you mean only adding nodes or also moving up the version of
 C*?


 On Mon, Feb 17, 2014 at 2:23 AM, Erick Ramirez er...@ramirez.com.auwrote:

 Ertio,

 It's not so much upgrading, but simply adding more nodes to your existing
 setup.

 Cheers,
 Erick


 On Sun, Feb 16, 2014 at 2:13 PM, Ertio Lew ertio...@gmail.com wrote:

 I started off with a single cassandra node on my 2GB digital ocean VPS,
 but now I'm planning to upgrade it to 3 node cluster. My single node
 contain around 10 GB data spread across 10-12 column families.

 What should be the strategy to upgrade that to 3 node cluster, bearing
 in mind that my data remains safe on this production server.






 --
 http://spawgi.wordpress.com
 We can do it and do it better.



How do I upgrade a single cassandra node in production to 3 nodes cluster ?

2014-02-15 Thread Ertio Lew
I started off with a single cassandra node on my 2GB digital ocean VPS, but
now I'm planning to upgrade it to 3 node cluster. My single node contain
around 10 GB data spread across 10-12 column families.

What should be the strategy to upgrade that to 3 node cluster, bearing in
mind that my data remains safe on this production server.


Cassandra consuming too much memory in ubuntu as compared to within windows, same machine.

2014-01-04 Thread Ertio Lew
I run a development Cassandra single node server on both ubuntu  windows 8
on my dual boot 4GB(RAM) machine.

I see that cassandra runs fine under windows without any crashes or OOMs
however in ubuntu on same machine, it always gives an OOM message

*$* *sudo service cassandra start*
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms4G -Xmx4G -Xmn800M
-XX:+HeapDumpOnOutOfMemoryError -Xss256k


Here is the memory usage for empty cassandra server in ubuntu.
*(PID)1169 (USER)cassandr  (PR)20   (NI)0 (VIRT)2639m (RES)1.3g  (SHR)17m S
   (%CPU)1 (%MEMORY)33.9   (TIME)0:53.80(COMMAND)java*

The memory usage however while running under windows is very low relative
to this.

What is the reason behind this ?

Also how can I prevent these OOMs within Ubuntu? I am running Datastax's
DSC version 2.0.3.


Re: Why Solandra stores Solr data in Cassandra ? Isn't solr complete solution ?

2013-10-04 Thread Ertio Lew
Yes, what is Solr Cloud then for, that already provides clustering support,
so what's the need for Cassandra ?


On Tue, Oct 1, 2013 at 2:06 AM, Sávio Teles savio.te...@lupa.inf.ufg.brwrote:


 Solr's index sitting on a single machine, even if that single machine can
 vertically scale, is a single point of failure.


 And about Cloud Solr?


 2013/9/30 Ken Hancock ken.hanc...@schange.com

 Yes.


 On Mon, Sep 30, 2013 at 1:57 PM, Andrey Ilinykh ailin...@gmail.comwrote:


 Also, be aware that while Cassandra has knobs to allow you to get
 consistent read results (CL=QUORUM), DSE Search does not. If a node drops
 messages for whatever reason, outtage, mutation, etc. its solr indexes will
 be inconsistent with other nodes in its replication group.

 Will repair fix it?




 --
 *Ken Hancock *| System Architect, Advanced Advertising
 SeaChange International
 50 Nagog Park
 Acton, Massachusetts 01720
 ken.hanc...@schange.com | www.schange.com | 
 NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx

 Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com
  | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
 LinkedIn] http://www.linkedin.com/in/kenhancock

 [image: SeaChange International]
  http://www.schange.com/This e-mail and any attachments may contain
 information which is SeaChange International confidential. The information
 enclosed is intended only for the addressees herein and may not be copied
 or forwarded without permission from SeaChange International.




 --
 Atenciosamente,
 Sávio S. Teles de Oliveira
 voice: +55 62 9136 6996
 http://br.linkedin.com/in/savioteles
  Mestrando em Ciências da Computação - UFG
 Arquiteto de Software
 Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG



Re: What is the best way to install upgrade Cassandra on Ubuntu ?

2013-10-03 Thread Ertio Lew
Thanks for clarifications!
Btw DSC installs OpenJDK when java is not present on your system. Don't
know why it doesn't include just the preferred Oracle JRE installation 
take care of later updates to that as well, so that could be a reason to
choose DSC over official apache Debian(as that would be complete package to
run cassandra), otherwise I can't see any strong reasons to prefer it !?


On Fri, Oct 4, 2013 at 4:34 AM, Daniel Chia danc...@coursera.org wrote:

 Opscenter is a separate package:
 http://www.datastax.com/documentation/opscenter/3.2/webhelp/index.html?pagename=docsversion=opscenterfile=index#opsc/install/opscInstallDeb_t.html

 Thanks,
 Daniel


 On Tue, Oct 1, 2013 at 8:11 PM, Aaron Morton aa...@thelastpickle.comwrote:

 Does DSC include other things like Opscenter by default ?

 Not sure, I've normally installed it with an existing cluster.

 Would it be possible to remove any of these installations but keeping the
 data intact  easily switch to the another, I mean switching from DSC
 package to apache one or vice versa ?

 Yes.
 Same code, same data.

 A

  -
 Aaron Morton
 New Zealand
 @aaronmorton

 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com

 On 30/09/2013, at 9:58 PM, Ertio Lew ertio...@gmail.com wrote:

 Thanks Aaron!

 Does DSC include other things like Opscenter by default ? I installed DSC
 on linux, but Opscenter wasn't installed there but when tried on Windows it
 was installed along with JRE  python, using the windows installer.

 Would it be possible to remove any of these installations but keeping the
 data intact  easily switch to the another, I mean switching from DSC
 package to apache one or vice versa ?


 On Mon, Sep 30, 2013 at 1:10 PM, Aaron Morton aa...@thelastpickle.comwrote:

 I am not sure if I should use datastax's DSC or official Debian packages
 from Cassandra. How do I choose between them for a production server ?

 They are technically the same.
 The DSC update will come out a little after the Apache release, and I
 _think_ they release for every Apache release.

  1.  when I upgrade to a newer version, would that retain my previous
 configurations so that I don't need to configure everything again ?

 Yes if you select that when doing the package install.

 2.  would that smoothly replace the previous installation by itself ?


 Yes


 3.  what's the way (kindly, if you can tell the command) to upgrade ?



 http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#upgrade/upgradeC_c.html#concept_ds_yqj_5xr_ck

 4. when should I prefer datastax's dsc to that ? (I need to install for
 production env.)

 Above

 Hope that helps.


  -
 Aaron Morton
 New Zealand
 @aaronmorton

 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com

 On 27/09/2013, at 11:01 PM, Ertio Lew ertio...@gmail.com wrote:

 I am not sure if I should use datastax's DSC or official Debian packages
 from Cassandra. How do I choose between them for a production server ?



 On Fri, Sep 27, 2013 at 11:02 AM, Ertio Lew ertio...@gmail.com wrote:


  Could you please clarify that:
 1.  when I upgrade to a newer version, would that retain my previous
 configurations so that I don't need to configure everything again ?
 2.  would that smoothly replace the previous installation by itself ?
 3.  what's the way (kindly, if you can tell the command) to upgrade ?
 4. when should I prefer datastax's dsc to that ? (I need to install for
 production env.)


 On Fri, Sep 27, 2013 at 12:50 AM, Robert Coli rc...@eventbrite.comwrote:

 On Thu, Sep 26, 2013 at 12:05 PM, Ertio Lew ertio...@gmail.comwrote:

 How do you install Cassandra on Ubuntu  later how do you upgrade the
 installation on the node when an update has arrived ? Do you simply
 download  replace the latest tar.gz, untar it to replace the older
 cassandra files? How do you do it ? How does this upgrade process differ
 for a major version upgrade, like say switching from 1.2 series to 2.0
 series ?


 Use the deb packages. To upgrade, install the new package. Only
 upgrade a single major version. and be sure to consult NEWS.txt for any
 upgrade caveats.

 Also be aware of this sub-optimal behavior of the debian packages :

 https://issues.apache.org/jira/browse/CASSANDRA-2356

 =Rob










Re: What is the best way to install upgrade Cassandra on Ubuntu ?

2013-09-30 Thread Ertio Lew
Thanks Aaron!

Does DSC include other things like Opscenter by default ? I installed DSC
on linux, but Opscenter wasn't installed there but when tried on Windows it
was installed along with JRE  python, using the windows installer.

Would it be possible to remove any of these installations but keeping the
data intact  easily switch to the another, I mean switching from DSC
package to apache one or vice versa ?


On Mon, Sep 30, 2013 at 1:10 PM, Aaron Morton aa...@thelastpickle.comwrote:

 I am not sure if I should use datastax's DSC or official Debian packages
 from Cassandra. How do I choose between them for a production server ?

 They are technically the same.
 The DSC update will come out a little after the Apache release, and I
 _think_ they release for every Apache release.

  1.  when I upgrade to a newer version, would that retain my previous
 configurations so that I don't need to configure everything again ?

 Yes if you select that when doing the package install.

 2.  would that smoothly replace the previous installation by itself ?


 Yes


 3.  what's the way (kindly, if you can tell the command) to upgrade ?



 http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#upgrade/upgradeC_c.html#concept_ds_yqj_5xr_ck

 4. when should I prefer datastax's dsc to that ? (I need to install for
 production env.)

 Above

 Hope that helps.


 -
 Aaron Morton
 New Zealand
 @aaronmorton

 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com

 On 27/09/2013, at 11:01 PM, Ertio Lew ertio...@gmail.com wrote:

 I am not sure if I should use datastax's DSC or official Debian packages
 from Cassandra. How do I choose between them for a production server ?



 On Fri, Sep 27, 2013 at 11:02 AM, Ertio Lew ertio...@gmail.com wrote:


  Could you please clarify that:
 1.  when I upgrade to a newer version, would that retain my previous
 configurations so that I don't need to configure everything again ?
 2.  would that smoothly replace the previous installation by itself ?
 3.  what's the way (kindly, if you can tell the command) to upgrade ?
 4. when should I prefer datastax's dsc to that ? (I need to install for
 production env.)


 On Fri, Sep 27, 2013 at 12:50 AM, Robert Coli rc...@eventbrite.comwrote:

 On Thu, Sep 26, 2013 at 12:05 PM, Ertio Lew ertio...@gmail.com wrote:

 How do you install Cassandra on Ubuntu  later how do you upgrade the
 installation on the node when an update has arrived ? Do you simply
 download  replace the latest tar.gz, untar it to replace the older
 cassandra files? How do you do it ? How does this upgrade process differ
 for a major version upgrade, like say switching from 1.2 series to 2.0
 series ?


 Use the deb packages. To upgrade, install the new package. Only upgrade
 a single major version. and be sure to consult NEWS.txt for any upgrade
 caveats.

 Also be aware of this sub-optimal behavior of the debian packages :

 https://issues.apache.org/jira/browse/CASSANDRA-2356

 =Rob







Why Solandra stores Solr data in Cassandra ? Isn't solr complete solution ?

2013-09-30 Thread Ertio Lew
Solr's data is stored on the file system as a set of index files[
http://stackoverflow.com/a/7685579/530153]. Then why do we need anything
like Solandra or DataStax Enterprise Search? Isn't Solr complete solution
in itself ?  What do we need to integrate with Cassandra ?


Among Datastax community Cassandra debian package, which to choose for production install ?

2013-09-28 Thread Ertio Lew
I think both provide the same thing except Datastax Community also provides
some extras like Opscenter, etc. But I cannot find opscenter installed when
I installled DSC on ubuntu. Although on windows installation, I saw
opscenter  JRE as well , so I think for DSC, there is no such prerequisite
for Oracle JRE as required for Cassandra debain package, is it so ?

Btw which is usually preferred for production installs ?

I may need to use Opscenter but just *occasionally*.


Re: What is the best way to install upgrade Cassandra on Ubuntu ?

2013-09-27 Thread Ertio Lew
I am not sure if I should use datastax's DSC or official Debian packages
from Cassandra. How do I choose between them for a production server ?



On Fri, Sep 27, 2013 at 11:02 AM, Ertio Lew ertio...@gmail.com wrote:


  Could you please clarify that:
 1.  when I upgrade to a newer version, would that retain my previous
 configurations so that I don't need to configure everything again ?
 2.  would that smoothly replace the previous installation by itself ?
 3.  what's the way (kindly, if you can tell the command) to upgrade ?
 4. when should I prefer datastax's dsc to that ? (I need to install for
 production env.)


 On Fri, Sep 27, 2013 at 12:50 AM, Robert Coli rc...@eventbrite.comwrote:

 On Thu, Sep 26, 2013 at 12:05 PM, Ertio Lew ertio...@gmail.com wrote:

 How do you install Cassandra on Ubuntu  later how do you upgrade the
 installation on the node when an update has arrived ? Do you simply
 download  replace the latest tar.gz, untar it to replace the older
 cassandra files? How do you do it ? How does this upgrade process differ
 for a major version upgrade, like say switching from 1.2 series to 2.0
 series ?


 Use the deb packages. To upgrade, install the new package. Only upgrade a
 single major version. and be sure to consult NEWS.txt for any upgrade
 caveats.

 Also be aware of this sub-optimal behavior of the debian packages :

 https://issues.apache.org/jira/browse/CASSANDRA-2356

 =Rob





What is the best way to install upgrade Cassandra on Ubuntu ?

2013-09-26 Thread Ertio Lew
How do you install Cassandra on Ubuntu  later how do you upgrade the
installation on the node when an update has arrived ? Do you simply
download  replace the latest tar.gz, untar it to replace the older
cassandra files? How do you do it ? How does this upgrade process differ
for a major version upgrade, like say switching from 1.2 series to 2.0
series ?


Re: What is the best way to install upgrade Cassandra on Ubuntu ?

2013-09-26 Thread Ertio Lew
 Could you please clarify that:
1.  when I upgrade to a newer version, would that retain my previous
configurations so that I don't need to configure everything again ?
2.  would that smoothly replace the previous installation by itself ?
3.  what's the way (kindly, if you can tell the command) to upgrade ?
4. when should I prefer datastax's dsc to that ? (I need to install for
production env.)


On Fri, Sep 27, 2013 at 12:50 AM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Sep 26, 2013 at 12:05 PM, Ertio Lew ertio...@gmail.com wrote:

 How do you install Cassandra on Ubuntu  later how do you upgrade the
 installation on the node when an update has arrived ? Do you simply
 download  replace the latest tar.gz, untar it to replace the older
 cassandra files? How do you do it ? How does this upgrade process differ
 for a major version upgrade, like say switching from 1.2 series to 2.0
 series ?


 Use the deb packages. To upgrade, install the new package. Only upgrade a
 single major version. and be sure to consult NEWS.txt for any upgrade
 caveats.

 Also be aware of this sub-optimal behavior of the debian packages :

 https://issues.apache.org/jira/browse/CASSANDRA-2356

 =Rob




Why don't you start off with a “single small” Cassandra server as you usually do it with MySQL?

2013-09-18 Thread Ertio Lew
For any website just starting out, the load initially is minimal  grows
with a slow pace initially. People usually start with their MySQL based
sites with a single server(***that too a VPS not a dedicated server)
running as both app server as well as DB server  usually get too far with
this setup  only as they feel the need they separate the DB from the app
server giving it a separate VPS server. This is what a start up expects the
things to be while planning about resources procurement.

But so far what I have seen, it's something very different with Cassandra.
People usually recommend starting out with atleast a 3 node cluster, (on
dedicated servers) with lots  lots of RAM. 4GB or 8GB RAM is what they
suggest to start with. So is it that Cassandra requires more hardware
resources in comparison to MySQL, for a website to deliver similar
performance, serve similar load/ traffic  same amount of data. I
understand about higher storage requirements of Cassandra due to
replication but what about other hardware resources ?

Can't we start off with Cassandra based apps just like MySQL. Starting with
1 or 2 VPS  adding more whenever there's a need ?

I don't want to compare apples with oranges. I just want to know how much
more dangerous situation I may be in when I start out with a single node
VPS based cassandra installation Vs a single node VPS based MySQL
installation. Difference between these two situations. Are cassandra
servers more prone to be unavailable than MySQL servers ? What is bad if I
put tomcat too along with Cassandra as people use LAMP stack on single
server.

-


*This question is also posted at StackOverflow
herehttp://stackoverflow.com/questions/18462530/why-dont-you-start-off-with-a-single-small-cassandra-server-as-you-usually

has an open bounty worth +50 rep.*


Maintain backup for single node cluster

2013-09-05 Thread Ertio Lew
I would like to have a single node cassandra cluster initially but to
maintain backups for single node  how about occasionally  temporarily
adding a second node (one that would contain the backup, this could be my
dev machine as well, far far from first node in some remote datacenter) to
cluster as a replica so that data would be synchronized on both as if it
were a replica.

Would it be possible to do this ? May be I could do this backup once 2-3
days.


CustomTThreadPoolServer.java: Error occurred during processing of message.

2013-08-29 Thread Ertio Lew
I suddenly started to encounter this weird issue after writing some data to
Cassandra. Didn't know exactly what was written before this or due to which
this started happening.



ERROR [pool-2-thread-30] 2013-08-29 19:55:24,778
CustomTThreadPoolServer.java (line 205) Error occurred during processing of
message.

java.lang.StringIndexOutOfBoundsException: String index out of range:
-2147418111

 at java.lang.String.checkBounds(String.java:397)

at java.lang.String.init(String.java:442)

at
org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java:339)

at
org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandra.java:18958)

at
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:3441)

at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)

 at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)

at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)

ERROR [pool-2-thread-31] 2013-08-29 19:55:24,910
CustomTThreadPoolServer.java (line 205) Error occurred during processing of
message.

java.lang.StringIndexOutOfBoundsException: String index out of range:
-2147418111

at java.lang.String.checkBounds(String.java:397)

at java.lang.String.init(String.java:442)

 at
org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java:339)

 at
org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandra.java:18958)

 at
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:3441)

 at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)

 at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)

 at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

 at java.lang.Thread.run(Thread.java:662)


Any ideas ??


Re: CustomTThreadPoolServer.java: Error occurred during processing of message.

2013-08-29 Thread Ertio Lew
Running Cassandra (1.0.0 final) single node  with default configurations on
Windows dev machine.  Using Hector.


On Thu, Aug 29, 2013 at 10:50 PM, Ertio Lew ertio...@gmail.com wrote:

 I suddenly started to encounter this weird issue after writing some data
 to Cassandra. Didn't know exactly what was written before this or due to
 which this started happening.



 ERROR [pool-2-thread-30] 2013-08-29 19:55:24,778
 CustomTThreadPoolServer.java (line 205) Error occurred during processing of
 message.

 java.lang.StringIndexOutOfBoundsException: String index out of range: -
 2147418111

  at java.lang.String.checkBounds(String.java:397)

 at java.lang.String.init(String.java:442)

 at
 org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java:339)

 at
 org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandra.java:18958)

 at
 org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:3441)

 at
 org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)

  at
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

 at java.lang.Thread.run(Thread.java:662)

 ERROR [pool-2-thread-31] 2013-08-29 19:55:24,910
 CustomTThreadPoolServer.java (line 205) Error occurred during processing of
 message.

 java.lang.StringIndexOutOfBoundsException: String index out of range: -
 2147418111

  at java.lang.String.checkBounds(String.java:397)

 at java.lang.String.init(String.java:442)

  at
 org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java:339)

  at
 org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandra.java:18958)

  at
 org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:3441)

  at
 org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)

  at
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)

  at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

  at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

  at java.lang.Thread.run(Thread.java:662)


 Any ideas ??




Re: Which of these VPS configurations would perform better for Cassandra ?

2013-08-06 Thread Ertio Lew
Amazon seems to much overprice its services. If you look out for a similar
size deployment elsewhere like linode or digital ocean(very competitive
pricing), you'll notice huge differences. Ok, some services  features are
extra but may we all don't need them necessarily  when you can host on
non-dedicated virtual servers on Amazon you can also do it with similar
configuration nodes elsewhere too.

IMO these huge costs associated with cassandra deployment are too heavy for
small startups just starting out. I believe, If you consider a deployment
for similar application using MySQL it should be quite cheaper/
affordable(though i'm not exactly sure). Atleast you don't usually create a
cluster from the beginning. Probably we made a wrong decision to choose
cassandra considering only its technological advantages.


Re: Which of these VPS configurations would perform better for Cassandra ?

2013-08-04 Thread Ertio Lew
@David:
Like all other start-ups, we too cannot start with all dedicated servers
for Cassandra. So right now we have no better choice except for using a VPS
:), but we can definitely choose one from amongst a suitable set of VPS
configurations. As of now since we are starting out, could we initiate our
cluster with 2 nodes(RF=2), (KVM, 2GB ram, 2 cores, 30GB SDD) . Right now
we wont we having a very heavy load on Cassandra until a next few months
till we grow our user base. So, this choice is mainly based on the pricing
vs configuration as well as digital ocean's good reputation in the
community.


On Sun, Aug 4, 2013 at 12:53 AM, David Schairer dschai...@humbaba.netwrote:

 I've run several lab configurations on linodes; I wouldn't run cassandra
 on any shared virtual platform for large-scale production, just because
 your IO performance is going to be really hard to predict.  Lots of people
 do, though -- depends on your cassandra loads and how consistent you need
 to have performance be, as well as how much of your working set will fit
 into memory.  Remember that linode significantly oversells their CPU as
 well.

 The release version of KVM, at least as of a few months ago, still doesn't
 support TRIM on SSD; that, plus the fact that you don't know how others
 will use SSDs or if their file systems will keep the SSDs healthy, means
 that SSD performance on KVM is going to be highly unpredictable.  I have
 not tested digitalocean, but I did test several other KVM+SSD shared-tenant
 hosting providers aggressively for cassandra a couple months ago; they all
 failed badly.

 Your mileage will vary considerably based on what you need out of
 cassandra, what your data patterns look like, and how you configure your
 system.  That said, I would use xen before KVM for high-performance IO.

 I have not run Cassandra in any volume on Amazon -- lots of folks have,
 and may have recommendations (including SSD) there for where it falls on
 the price/performance curve.

 --DRS

 On Aug 3, 2013, at 11:33 AM, Ertio Lew ertio...@gmail.com wrote:

  I am building a cluster(initially starting with a 2-3 nodes cluster). I
 have came across two seemingly good options for hosting, Linode  Digital
 Ocean. VPS configuration for both listed below:
 
 
  Linode:-
  --
  XEN Virtualization
  2 GB RAM
  8 cores CPU (2x priority) (8 processor Xen instances)
  96 GB Storage
 
 
  Digital Ocean:-
  -
  KVM Virtualization
  2GB Memory
  2 Cores
  40GB **SSD Disk***
  Digitial Ocean's VPS is at half price of above listed Linode VPS,
 
 
  Could you clarify which of these two VPS would be better as Cassandra
 nodes ?
 
 




Which of these VPS configurations would perform better for Cassandra ?

2013-08-03 Thread Ertio Lew
I am building a cluster(initially starting with a 2-3 nodes cluster). I
have came across two seemingly good options for hosting, Linode  Digital
Ocean. VPS configuration for both listed below:


Linode:-
--
XEN Virtualization
2 GB RAM
8 cores CPU (2x priority) (8 processor Xen instances)
96 GB Storage


Digital Ocean:-
-
KVM Virtualization
2GB Memory
2 Cores
40GB ***SSD *Disk***
Digitial Ocean's VPS is at half price of above listed Linode VPS,


Could you clarify which of these two VPS would be better as Cassandra nodes
?


Re:

2013-04-18 Thread Ertio Lew
I use hector


On Thu, Apr 18, 2013 at 1:35 PM, aaron morton aa...@thelastpickle.comwrote:

  ERROR 08:40:42,684 Error occurred during processing of message.
  java.lang.StringIndexOutOfBoundsException: String index out of range:
 -214741811
  1
  at java.lang.String.checkBounds(String.java:397)
  at java.lang.String.init(String.java:442)
  at
 org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol
  .java:339)
  at
 org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandr
 This is an error when the server is trying to read what the client has
 sent.

  Is this caused due to my application putting any corrupted data?
 Looks that way. What client are you using ?

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 18/04/2013, at 3:21 PM, Ertio Lew ertio...@gmail.com wrote:

  I run cassandra on single win 8 machine for development needs.
 Everything has been working fine for  several months but just today I saw
 this error message in cassandra logs  all host pools were marked down.
 
 
  ERROR 08:40:42,684 Error occurred during processing of message.
  java.lang.StringIndexOutOfBoundsException: String index out of range:
 -214741811
  1
  at java.lang.String.checkBounds(String.java:397)
  at java.lang.String.init(String.java:442)
  at
 org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol
  .java:339)
  at
 org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandr
  a.java:18958)
  at
 org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(
  Cassandra.java:3441)
  at
 org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.jav
  a:2889)
  at
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run
  (CustomTThreadPoolServer.java:187)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
  utor.java:886)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
  .java:908)
  at java.lang.Thread.run(Thread.java:662)
 
 
  After restarting the server everything again worked fine.
  I am curious to know what is this related to. Is this caused due to my
 application putting any corrupted data?
 
 




[no subject]

2013-04-17 Thread Ertio Lew
I run cassandra on single win 8 machine for development needs. Everything
has been working fine for  several months but just today I saw this error
message in cassandra logs  all host pools were marked down.



ERROR 08:40:42,684 Error occurred during processing of message.
java.lang.StringIndexOutOfBoundsException: String index out of range:
-214741811
1
at java.lang.String.checkBounds(String.java:397)
at java.lang.String.init(String.java:442)
at
org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol
.java:339)
at
org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandr
a.java:18958)
at
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(
Cassandra.java:3441)
at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.jav
a:2889)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run
(CustomTThreadPoolServer.java:187)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
at java.lang.Thread.run(Thread.java:662)


After restarting the server everything again worked fine.
I am curious to know what is this related to. Is this caused due to my
application putting any corrupted data?


Re: Is it bad putting columns with composite or integer name in CF with ByteType comparator validator ?

2012-11-01 Thread Ertio Lew
Thoughts, please ?


On Thu, Nov 1, 2012 at 7:12 PM, Ertio Lew ertio...@gmail.com wrote:

 Would that do any harm or are there any downsides, if I store columns with
 composite names or Integer type names in a column family with bytesType
 comparator  validator. I have observed that bytesType comparator would
 also sort the integer named columns in similar fashion as done by
 IntegerType comparator, so why should I just lock my CF to just store
 Integer or composite named columns, would be good if I could just mix
 different datatypes in same column family, No !?


Re: Option for ordering columns by timestamp in CF

2012-10-13 Thread Ertio Lew
@B Todd Burruss:
Regarding the use cases, I think they are pretty common. At least I see its
usages very frequently in my project. Lets say when the application needs
to store a timeline of bookmark activity by a user on certain items then if
I could store the activity data containing columns(with concerned item id
as column name)  get it ordered by timestamp then I could also fetch from
that row whether or not a particular item was bookmarked by user.
Ordering columns by time is a very common requirement in any application
therefore if such a mechanism is provided by cassandra, it would be really
useful  convenient to app developers.

On Sat, Oct 13, 2012 at 8:50 PM, Martin Koch m...@issuu.com wrote:

 One example could be to identify when a row was last updated. For example,
 if I have a column family for storing users, the row key is a user ID and
 the columns are values for that user, e.g. natural column names would be
 firstName, lastName, address, etc; column names don't naturally
 include a date here.

 Sorting the coulmns by timestamp and picking the last would allow me to
 know when the row was last modified. (I could manually maintain a 'last
 modified' column as well, I know, but just coming up with a use case :).

 /Martin Koch


 On Fri, Oct 12, 2012 at 11:39 PM, B. Todd Burruss bto...@gmail.comwrote:

 trying to think of a use case where you would want to order by
 timestamp, and also have unique column names for direct access.

 not really trying to challenge the use case, but you can get ordering
 by timestamp and still maintain a name for the column using
 composites. if the first component of the composite is a timestamp,
 then you can order on it.  when retrieved you will could have a name
 in the second component .. and have dupes as long as the timestamp is
 unique (use TimeUUID)


 On Fri, Oct 12, 2012 at 7:20 AM, Derek Williams de...@fyrie.net wrote:
  You probably already know this but I'm pretty sure it wouldn't be a
 trivial
  change, since to efficiently lookup a column by name requires the
 columns to
  be ordered by name. A separate index would be needed in order to provide
  lookup by column name if the row was sorted by timestamp (which is the
 way
  Redis implements it's sorted set).
 
 
  On Fri, Oct 12, 2012 at 12:13 AM, Ertio Lew ertio...@gmail.com wrote:
 
  Make column timestamps optional- kidding me, right ?:)  I do
 understand
  that this wont be possible as then cassandra wont be able to
 distinguish the
  latest among several copies of same column. I dont mean that. I just
 want
  the while ordering the columns, Cassandra(in an optional mode per CF)
 should
  not look at column names(they will exist though but for retrieval
 purposes
  not for ordering) but instead Cassandra would order the columns by
 looking
  at the timestamp values(timestamps would exist!). So the change would
 be
  just to provide a mode in which cassandra, while ordering, uses
 timestamps
  instead of column names.
 
 
  On Fri, Oct 12, 2012 at 2:26 AM, Tyler Hobbs ty...@datastax.com
 wrote:
 
  Without thinking too deeply about it, this is basically equivalent to
  disabling timestamps for a column family and using timestamps for
 column
  names, though in a very indirect (and potentially confusing) manner.
  So, if
  you want to open a ticket, I would suggest framing it as make column
  timestamps optional.
 
 
  On Wed, Oct 10, 2012 at 4:44 AM, Ertio Lew ertio...@gmail.com
 wrote:
 
  I think Cassandra should provide an configurable option on per column
  family basis to do columns sorting by time-stamp rather than column
 names.
  This would be really helpful to maintain time-sorted columns without
 using
  up the column name as time-stamps which might otherwise be used to
 store
  most relevant column names useful for retrievals. Very frequently we
 need to
  store data sorted in time order. Therefore I think this may be a very
  general requirement  not specific to just my use-case alone.
 
  Does it makes sense to create an issue for this ?
 
 
 
 
  On Fri, Mar 25, 2011 at 2:38 AM, aaron morton 
 aa...@thelastpickle.com
  wrote:
 
  If you mean order by the column timestamp (as passed by the client)
  that it not possible.
 
  Can you use your own timestamps as the column name and store them as
  long values ?
 
  Aaron
 
  On 25 Mar 2011, at 09:30, Narendra Sharma wrote:
 
   Cassandra 0.7.4
   Column names in my CF are of type byte[] but I want to order
 columns
   by timestamp. What is the best way to achieve this? Does it make
 sense for
   Cassandra to support ordering of columns by timestamp as option
 for a column
   family irrespective of the column name type?
  
   Thanks,
   Naren
 
 
 
 
 
  --
  Tyler Hobbs
  DataStax
 
 
 
 
 
  --
  Derek Williams
 





Re: Option for ordering columns by timestamp in CF

2012-10-12 Thread Ertio Lew
Make column timestamps optional- kidding me, right ?:)  I do understand
that this wont be possible as then cassandra wont be able to distinguish
the latest among several copies of same column. I dont mean that. I just
want the while ordering the columns, Cassandra(in an optional mode per CF)
should not look at column names(they will exist though but for retrieval
purposes not for ordering) but instead Cassandra would order the columns by
looking at the timestamp values(timestamps would exist!). So the change
would be just to provide a mode in which cassandra, while ordering, uses
timestamps instead of column names.

On Fri, Oct 12, 2012 at 2:26 AM, Tyler Hobbs ty...@datastax.com wrote:

 Without thinking too deeply about it, this is basically equivalent to
 disabling timestamps for a column family and using timestamps for column
 names, though in a very indirect (and potentially confusing) manner.  So,
 if you want to open a ticket, I would suggest framing it as make column
 timestamps optional.


 On Wed, Oct 10, 2012 at 4:44 AM, Ertio Lew ertio...@gmail.com wrote:

 I think Cassandra should provide an configurable option on per column
 family basis to do columns sorting by time-stamp rather than column names.
 This would be really helpful to maintain time-sorted columns without using
 up the column name as time-stamps which might otherwise be used to store
 most relevant column names useful for retrievals. Very frequently we need
 to store data sorted in time order. Therefore I think this may be a very
 general requirement  not specific to just my use-case alone.

 Does it makes sense to create an issue for this ?




 On Fri, Mar 25, 2011 at 2:38 AM, aaron morton aa...@thelastpickle.comwrote:

 If you mean order by the column timestamp (as passed by the client) that
 it not possible.

 Can you use your own timestamps as the column name and store them as
 long values ?

 Aaron

 On 25 Mar 2011, at 09:30, Narendra Sharma wrote:

  Cassandra 0.7.4
  Column names in my CF are of type byte[] but I want to order columns
 by timestamp. What is the best way to achieve this? Does it make sense for
 Cassandra to support ordering of columns by timestamp as option for a
 column family irrespective of the column name type?
 
  Thanks,
  Naren





 --
 Tyler Hobbs
 DataStax http://datastax.com/




Re: Option for ordering columns by timestamp in CF

2012-10-10 Thread Ertio Lew
I think Cassandra should provide an configurable option on per column
family basis to do columns sorting by time-stamp rather than column names.
This would be really helpful to maintain time-sorted columns without using
up the column name as time-stamps which might otherwise be used to store
most relevant column names useful for retrievals. Very frequently we need
to store data sorted in time order. Therefore I think this may be a very
general requirement  not specific to just my use-case alone.

Does it makes sense to create an issue for this ?



On Fri, Mar 25, 2011 at 2:38 AM, aaron morton aa...@thelastpickle.comwrote:

 If you mean order by the column timestamp (as passed by the client) that
 it not possible.

 Can you use your own timestamps as the column name and store them as long
 values ?

 Aaron

 On 25 Mar 2011, at 09:30, Narendra Sharma wrote:

  Cassandra 0.7.4
  Column names in my CF are of type byte[] but I want to order columns by
 timestamp. What is the best way to achieve this? Does it make sense for
 Cassandra to support ordering of columns by timestamp as option for a
 column family irrespective of the column name type?
 
  Thanks,
  Naren




Re: RF on per column family basis ?

2012-07-28 Thread Ertio Lew
I heard that it is* not highly recommended* to create more than a single
keyspace for an application or on a single cluster !?

Moreover I fail to understand that why Cassandra puts this limitation to
set RF on keyspace when, I guess, it makes more sense to do this on per CF
basis !?


Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread Ertio Lew
Actually these columns are 1 for each entity in my application  I need to
query at any time columns for a list of 300-500 entities in one go.


Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread Ertio Lew
For each user in my application, I want to store a *value* that is queried
by using the userId. So there is going to be one column for each user
(userId as col Name  *value* as col Value). Now I want to store these
columns such that can efficiently read columns for  atleast  300-500 users
in a single read query.


Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread Ertio Lew
I want to read columns for a randomly selected list of userIds(completely
random). I fetch the data using userIds(which would be used as column names
in case of single row or as rowkeys incase of 1 row for each user) for a
selected list of users. Assume that the application knows the list of
userIds  which it has to demand from DB.


Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-22 Thread Ertio Lew
I want to store hundred of millions of columns(containing id1 to id2
mappings) in the DB  at any single time, retrieve a set of about 200-500
columns based on the column names(id1) if they are in single row or using
rowkeys if each column is stored in a unique row.


If I put them in a single row:-

- disadvantage is that the no of columns is quite big, that would lead to
uneven load distribution,etc.
- plus factor is that I can easily read all columns I want to fetch using
col names doing a single row read


But if I store them each in a single row:-

- I will have to read hundreds of rows(300-500 or in rare cases up
to 1000) at a single time, this may lead to bad read performance(!?).
- A bit less space efficient


What schema should I go with ?


How to make the search by columns in range case insensitive ?

2012-05-14 Thread Ertio Lew
I need to make a search by names index using entity names as column names
in a row. This data is split in several rows using the first 3 character of
entity name as row key  the remaining part as column name  col value
contains entity id.

But there is a problem, I m storing this data in a CF using byte type
comparator. I need to make case insensitive queries to retrieve 'n' no of
cols column names starting from a point.
Any ideas about how should I do that ?


How do I add a custom comparator class to a cassandra cluster ?

2012-05-14 Thread Ertio Lew
I need to add a custom comparator to a cluster, to sort columns in a
certain customized fashion. How do I add the class to the cluster  ?


Re: How do I add a custom comparator class to a cassandra cluster ?

2012-05-14 Thread Ertio Lew
Can I put this comparator class in a separate new jar(with just this single
file) or is it to be appended to the original jar along with the other
comparator classes?

On Tue, May 15, 2012 at 12:22 AM, Tom Duffield (Mailing Lists) 
tom.duffield.li...@gmail.com wrote:

 Kirk is correct.

 --
 Tom Duffield (Mailing Lists)
 Sent with Sparrow http://www.sparrowmailapp.com/?sig

 On Monday, May 14, 2012 at 1:41 PM, Kirk True wrote:

 Disclaimer: I've never tried, but I'd imagine you can drop a JAR
 containing the class(es) into the lib directory and perform a rolling
 restart of the nodes.

 On 5/14/12 11:11 AM, Ertio Lew wrote:

 I need to add a custom comparator to a cluster, to sort columns in a
 certain customized fashion. How do I add the class to the cluster ?





Re: How do I add a custom comparator class to a cassandra cluster ?

2012-05-14 Thread Ertio Lew
@Brandon : I just created a jira issue to request this type of comparator
along with Cassandra.

It is about a UTF8 comparator that provides case insensitive ordering of
columns.
See issue here : https://issues.apache.org/jira/browse/CASSANDRA-4245

On Tue, May 15, 2012 at 11:14 AM, Brandon Williams dri...@gmail.com wrote:

 On Mon, May 14, 2012 at 1:11 PM, Ertio Lew ertio...@gmail.com wrote:
  I need to add a custom comparator to a cluster, to sort columns in a
 certain
  customized fashion. How do I add the class to the cluster  ?

 I highly recommend against doing this, because you'll be locked in to
 your comparator and not have an easy way out.  I dare say if none of
 the currently available comparators meet your needs, you're doing
 something wrong.

 -Brandon



Re: Schema advice/help

2012-03-27 Thread Ertio Lew
@R. Verlangen:
You are suggesting to keep a single row for all activities  read all the
columns from the row  then filter, right!?

If done that way (instead of keeping it in 5 rows) then I would need to
retrieve 100s-200s of columns from single row rather than just 50 columns
if I keep in 5 rows.. Which of these two would be better ? More columns
from single row OR less columns from multiple rows ?

On Tue, Mar 27, 2012 at 2:27 PM, R. Verlangen ro...@us2.nl wrote:

 You can just get a slice range with as start userId: and no end.


 2012/3/27 Maciej Miklas mac.mik...@googlemail.com

 multiget would require Order Preserving Partitioner, and this can lead to
 unbalanced ring and hot spots.

 Maybe you can use secondary index on itemtype - is must have small
 cardinality:
 http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/




 On Tue, Mar 27, 2012 at 10:10 AM, Guy Incognito dnd1...@gmail.comwrote:

 without the ability to do disjoint column slices, i would probably use 5
 different rows.

 userId:itemType - activityId

 then it's a multiget slice of 10 items from each of your 5 rows.


 On 26/03/2012 22:16, Ertio Lew wrote:

 I need to store activities by each user, on 5 items types. I always
 want to read last 10 activities on each item type, by a user (ie, total
 activities to read at a time =50).

 I am wanting to store these activities in a single row for each user so
 that they can be retrieved in single row query, since I want to read all
 the last 10 activities on each item.. I am thinking of creating composite
 names appending itemtype : activityId(activityId is just timestamp
 value) but then, I don't see about how to read the last 10 activities from
 all itemtypes.

 Any ideas about schema to do this better way ?






 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl




Re: Adding Long type rows to a CF containing Integer(32) type row keys, without overlapping ?

2012-03-26 Thread Ertio Lew
I need to use the range beyond the integer32 type range,  so I am using
Long to write those keys. I am afraid if this might lead to collisions with
the previously  stored integer keys in the same CF even if I leave out the
int32 type range.

On Mon, Mar 26, 2012 at 10:51 PM, aaron morton aa...@thelastpickle.comwrote:

 without them overlapping/disturbing each other (assuming that keys lie in
 above domains) ?

 Not sure what you mean by overlapping.

 42 as a int and 42 as a long are the same key.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 25/03/2012, at 9:47 PM, Ertio Lew wrote:

 I have been writing rows to a CF all with integer(4 byte) keys. So my CF
 contains rows with keys in the entire range from Integer.MIN_VALUE to
 Integer.MAX_VALUE.

 Now I want to store Long type keys as well in this CF **without disturbing
 the integer keys. The range of Long type keys would be excluding the
 integers's range ie  (-2^63  to -2^31) and (2^31 to 2^63).

 Would it be safe to mix the integer  long keys in single CF without them
 overlapping/disturbing each other (assuming that keys lie in above domains)
 ?





Schema advice/help

2012-03-26 Thread Ertio Lew
I need to store activities by each user, on 5 items types. I always want to
read last 10 activities on each item type, by a user (ie, total activities
to read at a time =50).

I am wanting to store these activities in a single row for each user so
that they can be retrieved in single row query, since I want to read all
the last 10 activities on each item.. I am thinking of creating composite
names appending itemtype : activityId(activityId is just timestamp
value) but then, I don't see about how to read the last 10 activities from
all itemtypes.

Any ideas about schema to do this better way ?


Adding Long type rows to a CF containing Integer(32) type row keys, without overlapping ?

2012-03-25 Thread Ertio Lew
I have been writing rows to a CF all with integer(4 byte) keys. So my CF
contains rows with keys in the entire range from Integer.MIN_VALUE to
Integer.MAX_VALUE.

Now I want to store Long type keys as well in this CF **without disturbing
the integer keys. The range of Long type keys would be excluding the
integers's range ie  (-2^63  to -2^31) and (2^31 to 2^63).

Would it be safe to mix the integer  long keys in single CF without them
overlapping/disturbing each other (assuming that keys lie in above domains)
?


Re: Fwd: information on cassandra

2012-03-25 Thread Ertio Lew
I guess 2 node cluster with RF=2 might also be a starting point. Isn't it ?
Are there any issues with this ?

On Sun, Mar 25, 2012 at 12:20 AM, samal samalgo...@gmail.com wrote:

 Cassandra has distributed architecture. So 1 node does not fit into it.
 although it can used but you loose its benefits , ok if you are just
 playing around, use vm  to learn how cluster communicate, handle request.

 To get full tolerance, redundancy and consistency minimum 3 node is
 required.

 Imp read here:
 http://wiki.apache.org/cassandra/
 http://www.datastax.com/docs/1.0/index
 http://thelastpickle.com/
 http://www.acunu.com/blogs/all/



 On Sat, Mar 24, 2012 at 11:37 PM, Garvita Mehta garvita.me...@tcs.comwrote:

 its not advisable to use cassandra on single node, as its basic
 definition says if a node fails, data still remains in the system, atleast
 3 nodes must be there while setting up a cassandra cluster.


 Garvita Mehta
 CEG - Open Source Technology Group
 Tata Consultancy Services
 Ph:- +91 22 67324756
 Mailto: garvita.me...@tcs.com
 Website: http://www.tcs.com
 
 Experience certainty. IT Services
 Business Solutions
 Outsourcing
 

 -puneet loya **wrote: -

 To: user@cassandra.apache.org
 From: puneet loya puneetl...@gmail.com
 Date: 03/24/2012 06:36PM
 Subject: Fwd: information on cassandra




 hi,

 I m puneet, an engineering student. I would like to know that, is
 cassandra useful considering we just have a single node(rather a single
 system) having all the information.
 I m looking for decent response time for the database. can you please
 respond?

 Thank you ,

 Regards,

 Puneet Loya

 =-=-=
 Notice: The information contained in this e-mail
 message and/or attachments to it may contain
 confidential or privileged information. If you are
 not the intended recipient, any dissemination, use,
 review, distribution, printing or copying of the
 information contained in this e-mail message
 and/or attachments to it are strictly prohibited. If
 you have received this communication in error,
 please notify us by reply e-mail or telephone and
 immediately and permanently delete the message
 and any attachments. Thank you





Re: Using cassandra at minimal expenditures

2012-03-01 Thread Ertio Lew
expensive :-) I was expecting to start with 2GB nodes, if not 1GB for
intial.

On Thu, Mar 1, 2012 at 3:43 PM, aaron morton aa...@thelastpickle.comwrote:

 As others said, depends on load and traffic and all sorts of thins.

 if you want a number, 4Gb would me a reasonable minimum IMHO. (You may get
 by with less).  8Gb is about the tops.
 Any memory not allocated to Cassandra  will be used to map files into
 memory.

 If you can get machines with 8GB ram thats a reasonable start.

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 1/03/2012, at 1:16 AM, Maki Watanabe wrote:

 Depends on your traffic :-)

 cassandra-env.sh will try to allocate heap with following formula if
 you don't specify MAX_HEAP_SIZE.
 1. calculate 1/2 of RAM on your system and cap to 1024MB
 2. calculate 1/4 of RAM on your system and cap to 8192MB
 3. pick the larger value

 So how about to start with the default? You will need to monitor the
 heap usage at first.

 2012/2/29 Ertio Lew ertio...@gmail.com:

 Thanks, I think I don't need high consistency(as per my app requirements)
 so

 I might be fine with CL.ONE instead of quorum, so I think  I'm probably

 going to be ok with a 2 node cluster initially..


 Could you guys also recommend some minimum memory to start with ? Of course

 that would depend on my workload as well, but that's why I am asking for
 the

 min



 On Wed, Feb 29, 2012 at 7:40 AM, Maki Watanabe watanabe.m...@gmail.com

 wrote:


 If you run your service with 2 node and RF=2, your data will be

 replicated but

 your service will not be redundant. ( You can't stop both of nodes )


 If your service doesn't need strong consistency ( allow cassandra returns

 old data after write, and possible write lost ), you can use CL=ONE

 for read and write

 to keep availability.


 maki






 --
 w3m





Re: Using cassandra at minimal expenditures

2012-02-28 Thread Ertio Lew
@Aaron: Are you suggesting 3 nodes (rather than 2) to allow quorum
operations even at the temporary loss of 1 node from cluster's reach ? I
understand this but I just another question popped up in my mind, probably
since I'm not much experienced managing cassandra, so I'm unaware whether
it may be a usual case that some out of n nodes of my cluster may be
down/unresponsive or out of cluster reach? (I, actually, considered this
situation like exceptional circumstance not normal one !?)


On Tue, Feb 28, 2012 at 2:34 AM, aaron morton aa...@thelastpickle.comwrote:

 *1. *I am wandering *what is the minimum recommended cluster size to
 start with*?

 IMHO 3
 http://thelastpickle.com/2011/06/13/Down-For-Me/

 A

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 28/02/2012, at 8:17 AM, Ertio Lew wrote:

 Hi

 I'm creating an networking site using cassandra. I am wanting to host this
 application but initially with the lowest possible resources  then slowly
 increasing the resources as per the service's demand  need.

 *1. *I am wandering *what is the minimum recommended cluster size to
 start with*?
 Are there any issues if I start with as little as 2 nodes in the cluster?
 In that case I guess would have replication factor of 2.
 (this way I would require at min. 3 vps, 1 as web server  the 2 for
 cassandra cluster, right?)

 *2.* Anyone using cassandra with such minimal resources in
 production environments ? Any experiences or difficulties encountered ?

 *3.* In case, you would like to recommend some hosting service suitable
 for me ? or if you would like to suggest some other ways to minimize the
 resources (actually the hosting expenses).





Re: Using cassandra at minimal expenditures

2012-02-28 Thread Ertio Lew
Thanks, I think I don't need high consistency(as per my app requirements)
so I might be fine with CL.ONE instead of quorum, so I think  I'm probably
going to be ok with a 2 node cluster initially..

Could you guys also recommend some minimum memory to start with ? Of course
that would depend on my workload as well, but that's why I am asking for
the min

On Wed, Feb 29, 2012 at 7:40 AM, Maki Watanabe watanabe.m...@gmail.comwrote:

  If you run your service with 2 node and RF=2, your data will be
 replicated but
  your service will not be redundant. ( You can't stop both of nodes )

 If your service doesn't need strong consistency ( allow cassandra returns
 old data after write, and possible write lost ), you can use CL=ONE
 for read and write
 to keep availability.

 maki



Using cassandra at minimal expenditures

2012-02-27 Thread Ertio Lew
Hi

I'm creating an networking site using cassandra. I am wanting to host this
application but initially with the lowest possible resources  then slowly
increasing the resources as per the service's demand  need.

*1. *I am wandering *what is the minimum recommended cluster size to start
with*?
Are there any issues if I start with as little as 2 nodes in the cluster?
In that case I guess would have replication factor of 2.
(this way I would require at min. 3 vps, 1 as web server  the 2 for
cassandra cluster, right?)

*2.* Anyone using cassandra with such minimal resources in
production environments ? Any experiences or difficulties encountered ?

*3.* In case, you would like to recommend some hosting service suitable for
me ? or if you would like to suggest some other ways to minimize the
resources (actually the hosting expenses).


Any tools like phpMyAdmin to see data stored in Cassandra ?

2012-01-29 Thread Ertio Lew
I have tried Sebastien's phpmyAdmin For
Cassandrahttps://github.com/sebgiroux/Cassandra-Cluster-Admin to
see the data stored in Cassandra in the same manner as phpMyAdmin allows.
But since it makes assumptions about the datatypes of the column
name/column value  doesn't allow to configure the datatype data should be
read as on per cf basis, I couldn't make the best use of it.

Are there any similar other tools out there that can do the job better ?


Re: Any tools like phpMyAdmin to see data stored in Cassandra ?

2012-01-29 Thread Ertio Lew
On Mon, Jan 30, 2012 at 7:16 AM, Frisch, Michael
michael.fri...@nuance.comwrote:

  OpsCenter?

  http://www.datastax.com/products/opscenter

  - Mike


  I have tried Sebastien's phpmyAdmin For 
 Cassandrahttps://github.com/sebgiroux/Cassandra-Cluster-Admin to
 see the data stored in Cassandra in the same manner as phpMyAdmin allows.
 But since it makes assumptions about the datatypes of the column
 name/column value  doesn't allow to configure the datatype data should be
 read as on per cf basis, I couldn't make the best use of it.

  Are there any similar other tools out there that can do the job better ?


Thanks, that's a great product but unfortunately doesn't work with windows.
Any tools for windows ?


Re: Using 5-6 bytes for cassandra timestamps vs 8…

2012-01-19 Thread Ertio Lew
It wont obviously matter in case your columns are fat but in several cases,
(at least I could think of several cases) where you need to, for example,
just store an integer column name  empty column value. Thus 12 bytes for
the column where 8 bytes is just the overhead to store timestamps doesn't
look very nice. And skinny columns is a very common use-case, I believe.

On Thu, Jan 19, 2012 at 1:26 PM, Maxim Potekhin potek...@bnl.gov wrote:

 I must have accidentally deleted all messages in this thread save this one.

 On the face value, we are talking about saving 2 bytes per column. I know
 it can add up with many columns, but relative to the size of the column --
 is it THAT significant?

 I made an effort to minimize my CF footprint by replacing the natural
 column keys with integers (and translating back and forth when writing and
 reading). It's easy to see that in my case I achieve almost 50% storage
 savings and at least 30%. But if the column in question contains more than
 20 bytes -- what's up with trying to save 2?

 Cheers

 Maxim



 On 1/18/2012 11:49 PM, Ertio Lew wrote:

 I believe the timestamps *on per column basis* are only required until
 the compaction time after that it may also work if the timestamp range
 could be specified globally on per SST table basis. and thus the
 timestamps until compaction are only required to be measure the time
 from the initialization of the new memtable to the point the column is
 written to that memtable. Thus you can easily fit that time in 4
 bytes. This I believe would save atleast  4 bytes overhead for each
 column.

 Is anything related to these overheads under consideration/ or planned
 in the roadmap ?



 On Tue, Sep 6, 2011 at 11:44 AM, Oleg 
 Anastastasyevoleganas@gmail.**comolega...@gmail.com
  wrote:

 I have a patch for trunk which I just have to get time to test a bit
 before I

 submit.

 It is for super columns and will use the super columns timestamp as the
 base

 and only store variant encoded offsets in the underlying columns.
 Could you please measure how much real benefit it brings (in real RAM
 consumption by JVM). It is hard to tell will it give noticeable results
 or not.
 AFAIK memory structures used for memtable consume much more memory. And
 64-bit
 JVM allocates memory aligned to 64-bit word boundary. So 37% of memory
 consumption reduction looks doubtful.






Re: Using 5-6 bytes for cassandra timestamps vs 8…

2012-01-18 Thread Ertio Lew
I believe the timestamps *on per column basis* are only required until
the compaction time after that it may also work if the timestamp range
could be specified globally on per SST table basis. and thus the
timestamps until compaction are only required to be measure the time
from the initialization of the new memtable to the point the column is
written to that memtable. Thus you can easily fit that time in 4
bytes. This I believe would save atleast  4 bytes overhead for each
column.

Is anything related to these overheads under consideration/ or planned
in the roadmap ?



On Tue, Sep 6, 2011 at 11:44 AM, Oleg Anastastasyev olega...@gmail.com wrote:


 I have a patch for trunk which I just have to get time to test a bit before I
 submit.
 It is for super columns and will use the super columns timestamp as the base
 and only store variant encoded offsets in the underlying columns.


 Could you please measure how much real benefit it brings (in real RAM
 consumption by JVM). It is hard to tell will it give noticeable results or 
 not.
 AFAIK memory structures used for memtable consume much more memory. And 64-bit
 JVM allocates memory aligned to 64-bit word boundary. So 37% of memory
 consumption reduction looks doubtful.




Re: Composite column names: How much space do they occupy ?

2012-01-02 Thread Ertio Lew
Sorry I forgot to tell that I'm using Hector to communicate with
Cassandra.  CS.toByteBuffer  is to convert the composite type name to
ByteBuffer.

Can anyone aware of Hector API enlighten me why am I seeing this size for
the composite type names.

On Mon, Jan 2, 2012 at 2:52 PM, aaron morton aa...@thelastpickle.comwrote:

 What is the definition of the composite type and what is CS.toByteBuffer ?

 CompositeTypes have a small overhead see
 https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/marshal/CompositeType.java

 Hope that helps.
 Aaron

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 2/01/2012, at 6:25 PM, Ertio Lew wrote:

 I am storing composite column names which are made up of two integer
 components. However I am shocked after seeing the storage overhead of these.

 I just tried out a composite name (with single integer component):

   Composite composite = new Composite();
   composite.addComponent(-165376575,is);

 System.out.println(CS.toByteBuffer( composite ).array().length); // the
 result is 256


 After writing  then reading back this composite column from cassandra:

 System.out.println(CS.toByteBuffer( readColumn.getName() ).array().length);
 // the result is 91


 How much is the storage overhead, as I am quite sure that  I'm making a
 mistake in realizing the actual values ?





Re: Composite column names: How much space do they occupy ?

2012-01-02 Thread Ertio Lew
Yes that makes a lot of sense!  on using remaining() method I see the
proper expected sizes.


On Mon, Jan 2, 2012 at 5:26 PM, Sylvain Lebresne sylv...@datastax.comwrote:

 I am not familiar enough with Hector to tell you if it is doing something
 special here, but note that:

 1) you may have better luck getting that kind of question answered
 quickly by using the Hector mailing list.

 2) that may or may not change what you're seeing (since again I don't
 know what Hector is actually doing), but bb.array().length is not a
 reliable way to get the effective length of a ByteBuffer, as it is
 perfectly
 legit to have a byte buffer only use parts of it's underlying array. You
 should use the remaining() method instead.

 --
 Sylvain

 On Mon, Jan 2, 2012 at 12:29 PM, Ertio Lew ertio...@gmail.com wrote:
  Sorry I forgot to tell that I'm using Hector to communicate with
 Cassandra.
  CS.toByteBuffer  is to convert the composite type name to ByteBuffer.
 
  Can anyone aware of Hector API enlighten me why am I seeing this size for
  the composite type names.
 
 
  On Mon, Jan 2, 2012 at 2:52 PM, aaron morton aa...@thelastpickle.com
  wrote:
 
  What is the definition of the composite type and what is CS.toByteBuffer
  ?
 
  CompositeTypes have a small overhead
  see
 https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/marshal/CompositeType.java
 
  Hope that helps.
  Aaron
 
  -
  Aaron Morton
  Freelance Developer
  @aaronmorton
  http://www.thelastpickle.com
 
  On 2/01/2012, at 6:25 PM, Ertio Lew wrote:
 
  I am storing composite column names which are made up of two integer
  components. However I am shocked after seeing the storage overhead of
 these.
 
  I just tried out a composite name (with single integer component):
 
Composite composite = new Composite();
composite.addComponent(-165376575,is);
 
  System.out.println(CS.toByteBuffer( composite ).array().length); // the
  result is 256
 
 
  After writing  then reading back this composite column from cassandra:
 
 
 
 System.out.println(CS.toByteBuffer( readColumn.getName() ).array().length);
  // the result is 91
 
 
  How much is the storage overhead, as I am quite sure that  I'm making a
  mistake in realizing the actual values ?
 
 
 



Composite column names: How much space do they occupy ?

2012-01-01 Thread Ertio Lew
I am storing composite column names which are made up of two integer
components. However I am shocked after seeing the storage overhead of these.

I just tried out a composite name (with single integer component):

  Composite composite = new Composite();
  composite.addComponent(-165376575,is);

System.out.println(CS.toByteBuffer( composite ).array().length); // the
result is 256


After writing  then reading back this composite column from cassandra:

System.out.println(CS.toByteBuffer( readColumn.getName() ).array().length);
// the result is 91


How much is the storage overhead, as I am quite sure that  I'm making a
mistake in realizing the actual values ?


Doubts related to composite type column names/values

2011-12-20 Thread Ertio Lew
With regard to the composite columns stuff in Cassandra, I have the
following doubts :

1. What is the storage overhead of the composite type column names/values,
and

2. what exactly is the difference between the DynamicComposite and Static
Composite ?


Re: Second Cassandra users survey

2011-11-03 Thread Ertio Lew
Provide an option to sort columns by timestamp i.e, in the order they have
been added to the row, with the facility to use any column names.

On Wed, Nov 2, 2011 at 4:29 AM, Jonathan Ellis jbel...@gmail.com wrote:

 Hi all,

 Two years ago I asked for Cassandra use cases and feature requests.
 [1]  The results [2] have been extremely useful in setting and
 prioritizing goals for Cassandra development.  But with the release of
 1.0 we've accomplished basically everything from our original wish
 list. [3]

 I'd love to hear from modern Cassandra users again, especially if
 you're usually a quiet lurker.  What does Cassandra do well?  What are
 your pain points?  What's your feature wish list?

 As before, if you're in stealth mode or don't want to say anything in
 public, feel free to reply to me privately and I will keep it off the
 record.

 [1]
 http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
 [2]
 http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
 [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Retreiving column by names Vs by range, which is more performant ?

2011-11-03 Thread Ertio Lew
Retrieving columns by names vs by range which is more performant , when you
have the options to do both ?


Re: Newbie question - fetching multiple columns of different datatypes and conversion from byte[]

2011-10-31 Thread Ertio Lew
Should the different datatype col values or names be first read as byte 
buffer  then converted to appropriate type using Hector's provided 
serializers api like the way shown below ?

ByteBuffer bb;
..

String s= StringSerializer.get().fromByteBuffer(bb);


Or are there any better ways ?


Re: Cassandra Cluster Admin - phpMyAdmin for Cassandra

2011-10-31 Thread Ertio Lew
Thanks so much SebWajam  for this great piece of work!

Is there a way to set a data type for displaying the column names/ values
of a CF ? It seems that your project always uses String Serializer for
any piece of data however most of the times in real world cases this is not
true so can we anyhow configure what serializer to use while reading the
data so that the data may be properly identified by your project 
delivered in a readable format ?

On Mon, Aug 22, 2011 at 7:17 AM, SebWajam sebast...@wajam.com wrote:

 Hi,

 I'm working on this project for a few months now and I think it's mature
 enough to post it here:
 Cassandra Cluster Admin on 
 GitHubhttps://github.com/sebgiroux/Cassandra-Cluster-Admin

 Basically, it's a GUI for Cassandra. If you're like me and used MySQL for
 a while (and still using it!), you get used to phpMyAdmin and its simple
 and easy to use user interface. I thought it would be nice to have a
 similar tool for Cassandra and I couldn't find any, so I build my own!

 Supported actions:

- Keyspace manipulation (add/edit/drop)
- Column Family manipulation (add/edit/truncate/drop)
- Row manipulation on column family and super column family
(insert/edit/remove)
- Basic data browser to navigate in the data of a column family (seems
to be the favorite feature so far)
- Support Cassandra 0.8+ atomic counters
- Support management of multiple Cassandra clusters

 Bug report and/or pull request are always welcome!

 --
 View this message in context: Cassandra Cluster Admin - phpMyAdmin for
 Cassandrahttp://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-Cluster-Admin-phpMyAdmin-for-Cassandra-tp6709930p6709930.html
 Sent from the cassandra-u...@incubator.apache.org mailing list 
 archivehttp://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/at 
 Nabble.com.



ByteBuffer as an initial serializer to read columns with mixed datatypes ?

2011-10-30 Thread Ertio Lew
I have a mix of byte[]  Integer column names/ values within a CF rows. So
should ByteBuffer be my initial choice for the serializer while making the
read query to the database for the mixed datatypes  then I should retrieve
the byte[] or Integer from ByteBuffer using the ByteBuffer api's getInt()
method ?

Is this a preferable way to read columns with integer/
byte[] names, initially as bytebuffer(s)  later converting them to Integer
or byte[] ?


Re: Authentication setup

2011-10-22 Thread Ertio Lew
Hey,

I'm too looking out for a similar thing. I guess this is a very common
requirement  may be soon provided as built-in functionality packed with
cassandra setup.

Btw nice to see if someone has ideas about how to implement this for now.




On Fri, Oct 21, 2011 at 6:53 PM, Alexander Konotop 
alexander.kono...@gmail.com wrote:

 Hello :-)
 Does anyone have a working config with normal secure authentication?
 I've just installed Cassandra 1.0.0 and see that SimpleAuthenticate is
 meant to be non-secure and was moved to examples. I need a production
 config - so I've tried to write this to config:
 
 authenticator: org.apache.cassandra.auth.AuthenticatedUser
 authority: org.apache.cassandra.auth.AuthenticatedUser
 
 But during cassandra startup log says:
 
 org.apache.cassandra.config.ConfigurationException: No default
 constructor for authenticator class
 'org.apache.cassandra.auth.AuthenticatedUser'.
 

 As I understand either AuthenticatedUser is a wrong class or I simply
 don't know how to set it up - does it need additional configs similar to
 access.properties or passwd.properties? Maybe there's a way to store
 users in cassandra DB itself, like, fore example, MySQL does?

 I've searched and tried lot of things the whole day but the only info
 that I found were two phrases - first told that SimpleAuth is just a
 toy and second told to look into source to look for more auth methods.
 But, for example, this:
 
 package org.apache.cassandra.auth;

 import java.util.Collections;
 import java.util.Set;

 /**
  * An authenticated user and her groups.
  */
 public class AuthenticatedUser
 {
public final String username;
public final SetString groups;

public AuthenticatedUser(String username)
{
this.username = username;
this.groups = Collections.emptySet();
}

public AuthenticatedUser(String username, SetString groups)
{
this.username = username;
this.groups = Collections.unmodifiableSet(groups);
}

@Override
public String toString()
{
return String.format(#User %s groups=%s, username, groups);
}
 }
 
 tells me just about nothing :-(

 Best regards
 Alexander



Using counters in 0.8

2011-05-18 Thread Ertio Lew
I am using Hector for a project  wanted to try out using counters with
latest 0.8 v Cassandra.

How do we work with counters in 0.8 version ? Any web-links to such examples
are appreciated.
Has Hector started to provide API for that ?


Columns values(integer) need frequent updates/ increments

2011-04-07 Thread Ertio Lew
Hi,

I am working on a Question/Answers web app using Cassandra(consider very
similar to StackOverflow sites). I need to built the reputation system for
users on the application. This way the user's reputation increases when s/he
answers correctly somebody's question. Thus if I keep the reputation score
of users as column values, these columns are very very frequently updated.
Thus I have several versions of a single column which I guess is very bad.

Similarly for the questions as well, the no of up-votes will increase very
very frequently and  hence again I'll get several versions of same column.

How should I try to minimize this ill effect?

** What I thought of..
Try using a separate CF for reputation system, so that the memtable stores
most of the columns(containing reputation scores of the users). Thus
frequent updates will update the column in the memtable, which means more
easier reads as well as updates. These reputations columns are anyways small
 do not explode in numbers(they only replace another column).


Is it possible to get just a count of the no of columns in a row, in efficient manner ?

2011-03-13 Thread Ertio Lew
Can I get just a count of the no of columns in a row without
deserializing all columns in row? Or should the usage of a counter
column be preferred that maintains the no of columns currently present
in the row, for the situations when the total count value is most
frequently used than reading the actual columns ?


Re: Using a synchronized counter that keeps track of no of users on the application using it to allot UserIds/ keys to the new users after sign up

2011-02-28 Thread Ertio Lew
Hi Ryan,

I am considering snowflake as an option for my usage with Cassandra
for a distributed application.
As I came to know snowflake uses 64 bits IDs. I am looking for a
solution that could help me generate 64 bits Ids
but in those 64 bits I would like at least 4 free bits so that I could
manipulate with those free bits to distinguish the two rows for a same
entity(split by kind of data) in same column family.

If I could keep the snowflake's Id size to around 60 bits, that would
be great for my use case. Is it possible to manipulate the bits safely
to around 60 bits? Perhaps the microsecond precision is not required
to that much depth for my use case.

Any kind of suggestions would be appreciated.

Best Regards
Ertio Lew







On Fri, Feb 4, 2011 at 1:09 AM, Ryan King r...@twitter.com wrote:
 You could also consider snowflake:

 http://github.com/twitter/snowflake

 which gives you ids that roughly sort by time (but aren't sequential).

 -ryan

 On Thu, Feb 3, 2011 at 11:13 AM, Matthew E. Kennedy
 matt.kenn...@spadac.com wrote:
 Unless you need your user identifiers to be sequential for some reason, I 
 would save yourself the headache of this kind of complexity and just use 
 UUIDs if you have to generate an identifier.

 On Feb 3, 2011, at 2:03 PM, Aklin_81 wrote:

 Hi all,
 To generate new keys/ UserIds for new users on my application, I am
 thinking of using a simple synchronized counter that can keep track of
 the no. of users registered on my application and when a new user
 signs up, he can be allotted the next available id.

 Since Cassandra is eventually consistent, Is this advisable to
 implement with Cassandra, but then I could also use stronger
 consistency level like quorum or all for this purpose.


 Please let me know your thoughts and suggesttions..

 Regards
 Asil





 --
 @rk



Re: Using a synchronized counter that keeps track of no of users on the application using it to allot UserIds/ keys to the new users after sign up

2011-02-28 Thread Ertio Lew
On Tue, Mar 1, 2011 at 1:26 AM, Aaron Morton aa...@thelastpickle.com wrote:
 This is mostly from memory. But the last 12 ? (4096 decimal) bits are a 
 counter for the number of id's generated in a particular millisecond for that 
 server. You could use the high 4 bits in that range for your data type flags 
 and the low 8 for the counter.

So then I would be able to generate a maximum of upto 256 Ids per
millisecond (or 256000 per second) on one machine!? Seems like a very
good limit for my use case. I dont think I would ever need beyond that
since my write volumes are quite below as compared to that limit..
Should I go for it or still are there any other things to consider ?


 Aaron

 On 1/03/2011, at 4:41 AM, Ertio Lew ertio...@gmail.com wrote:

 Hi Ryan,

 I am considering snowflake as an option for my usage with Cassandra
 for a distributed application.
 As I came to know snowflake uses 64 bits IDs. I am looking for a
 solution that could help me generate 64 bits Ids
 but in those 64 bits I would like at least 4 free bits so that I could
 manipulate with those free bits to distinguish the two rows for a same
 entity(split by kind of data) in same column family.

 If I could keep the snowflake's Id size to around 60 bits, that would
 be great for my use case. Is it possible to manipulate the bits safely
 to around 60 bits? Perhaps the microsecond precision is not required
 to that much depth for my use case.

 Any kind of suggestions would be appreciated.

 Best Regards
 Ertio Lew







 On Fri, Feb 4, 2011 at 1:09 AM, Ryan King r...@twitter.com wrote:
 You could also consider snowflake:

 http://github.com/twitter/snowflake

 which gives you ids that roughly sort by time (but aren't sequential).

 -ryan

 On Thu, Feb 3, 2011 at 11:13 AM, Matthew E. Kennedy
 matt.kenn...@spadac.com wrote:
 Unless you need your user identifiers to be sequential for some reason, I 
 would save yourself the headache of this kind of complexity and just use 
 UUIDs if you have to generate an identifier.

 On Feb 3, 2011, at 2:03 PM, Aklin_81 wrote:

 Hi all,
 To generate new keys/ UserIds for new users on my application, I am
 thinking of using a simple synchronized counter that can keep track of
 the no. of users registered on my application and when a new user
 signs up, he can be allotted the next available id.

 Since Cassandra is eventually consistent, Is this advisable to
 implement with Cassandra, but then I could also use stronger
 consistency level like quorum or all for this purpose.


 Please let me know your thoughts and suggesttions..

 Regards
 Asil





 --
 @rk




Specifying row caching on per query basis ?

2011-02-09 Thread Ertio Lew
Is there any way to specify on per query basis(like we specify the
Consistency level), what rows be cached while you're reading them,
from a row_cache enabled CF. I believe, this could lead to much more
efficient use of the cache space!!( if you use same data for different
features/ parts in your application which have different caching
needs).


Re: Specifying row caching on per query basis ?

2011-02-09 Thread Ertio Lew
Is this under consideration for future releases ? or being thought about!?



On Thu, Feb 10, 2011 at 12:56 AM, Jonathan Ellis jbel...@gmail.com wrote:
 Currently there is not.

 On Wed, Feb 9, 2011 at 12:04 PM, Ertio Lew ertio...@gmail.com wrote:
 Is there any way to specify on per query basis(like we specify the
 Consistency level), what rows be cached while you're reading them,
 from a row_cache enabled CF. I believe, this could lead to much more
 efficient use of the cache space!!( if you use same data for different
 features/ parts in your application which have different caching
 needs).




 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: Merging the rows of two column families(with similar attributes) into one ??

2011-02-08 Thread Ertio Lew
Thanks for adding up Benjamin!

On Wed, Feb 9, 2011 at 1:40 AM, Benjamin Coverston
ben.covers...@datastax.com wrote:


 On 2/4/11 11:58 PM, Ertio Lew wrote:

 Yes, a disadvantage of more no. of CF in terms of memory utilization
 which I see is: -

 if some CF is written less often as compared to other CFs, then the
 memtable would consume space in the memory until it is flushed, this
 memory space could have been much better used by a CF that's heavily
 written and read. And if you try to make the thresholds for flush
 smaller then more compactions would be needed.


 One more disadvantage here is that with CFs that vary widely in the write
 rate you can also end up with fragmented commit logs which in some cases we
 have seen actually fill up the commit log partition. As a consequence one
 thing to consider would be to lower the commit log flush threshold (in
 minutes) to something lower for the column families that do not see heavy
 use.



 On Sat, Feb 5, 2011 at 11:58 AM, Ertio Lewertio...@gmail.com  wrote:

 Thanks Tyler !

 I could not fully understand the reason why more no of column families
 would mean more memory.. if you have under control parameters like
 memtable_throughput  memtable_operations which are set per column
 family basis then you can directly control  adjust by splitting the
 memory space between two CFs in proportion to what you would do in
 single CF.
 Hence there should be no extra memory consumption for multiple CFs
 that have been split from single one??

 Regarding the compactions, I think even if they are more the size of
 the SST files to be compacted is smaller as the data has been split
 into two.
 Then more compactions but smaller too!!


 Then, provided the same amount of data, how can greater no of column
 families could be a bad option(if you split the values of parameters
 for memory consumption proportionately) ??

 --
 Regards,
 Ertio





 On Sat, Feb 5, 2011 at 10:43 AM, Tyler Hobbsty...@datastax.com  wrote:

 I read somewhere that more no of column families is not a good idea as
 it consumes more memory and more compactions to occur

 This is primarily true, but not in every case.

 But the caching requirements may be different as they cater to two
 different features.

 This is a great reason to *not* merge them.  Besides the key and row
 caches,
 don't forget about the OS buffer cache.

 Is it recommended to merge these two column families into one ??
 Thoughts
 ?

 No, this sounds like an anti-pattern to me.  The overhead from having
 two
 separate CFs is not that high.

 --
 Tyler Hobbs
 Software Engineer, DataStax
 Maintainer of the pycassa Cassandra Python client library





Re: Merging the rows of two column families(with similar attributes) into one ??

2011-02-05 Thread Ertio Lew
Thanks Tyler!

I think I'll have to very carefully take into consideration all these
factors before deciding upon how to split my data into CFs, as this
cannot an objective answer. I am expecting around atleast 8 column
families for my entire application, if I split the data strictly
according to the various features and requirements of the application.

I think there should have been provision for specifying on per query
basis, what rows be cached while you're reading them, from a row_cache
enabled CF. Thus you could easily merge similar data for different
features of your application in a single CF. I believe, this would
have also lead to much more efficient use of the cache space!!( if you
were using same data for different parts in your app which have
different caching needs)

Regards,

Ertio

On Sun, Feb 6, 2011 at 1:22 AM, Tyler Hobbs ty...@datastax.com wrote:
 if you have under control parameters like
 memtable_throughput  memtable_operations which are set per column
 family basis then you can directly control  adjust by splitting the
 memory space between two CFs in proportion to what you would do in
 single CF.
 Hence there should be no extra memory consumption for multiple CFs
 that have been split from single one??

 Yes, I think you have the right idea here.  This is a small amount of
 overhead for the extra memtable and keeping track of a second set of
 indexes, bloom filters, sstables, etc.

 Regarding the compactions, I think even if they are more the size of
 the SST files to be compacted is smaller as the data has been split
 into two.
 Then more compactions but smaller too!!

 Yes.

 if some CF is written less often as compared to other CFs, then the
 memtable would consume space in the memory until it is flushed, this
 memory space could have been much better used by a CF that's heavily
 written and read. And if you try to make the thresholds for flush
 smaller then more compactions would be needed.

 If you merge the two CFs together, then updates to the 'less freqent' rows
 will still consume memory, only it will all be within one memtable.
 (Memtables grow in size until they are flushed, they don't reserve some set
 amount of memory.)  Furthermore, because your memtables will be filled up by
 the 'more frequent' rows, the 'less frequent' rows will get fewer
 updates/overwrites in memory, so they will tend to be spread across a
 greater number of SSTables.

 --
 Tyler Hobbs
 Software Engineer, DataStax
 Maintainer of the pycassa Cassandra Python client library




Merging the rows of two column families(with similar attributes) into one ??

2011-02-04 Thread Ertio Lew
I read somewhere that more no of column families is not a good idea as
it consumes more memory and more compactions to occur  thus I am
trying to reduce the no. of column families by adding the rows of
other Column families(with similar attributes) as separate rows into
one.

I have two kinds of data for two separate features on my application.
If I store them in two different column families then both of them
will have similar attributes like same comparator type  sorting
needs. Thus I can also merge both of them in one column family, just
by adding the rows of another to this one(increasing the no of rows).
However some rows of 1st kind of data are very frequently used and
rows of 2nd data are less freq. used. But I dont think this will be a
problem as I am not merging two rows into one, but just adding them as
separate rows in the column family.
1st kind of data has wider rows and 2nd kind of data has very less wide rows.

But the caching requirements may be different as they cater to two
different features.(but I think it is even advantageous since
resources are free to be utilized by any data that's more frequently
used)


Is it recommended to merge these two column families into one ?? Thoughts ?

--

Ertio


Re: Merging the rows of two column families(with similar attributes) into one ??

2011-02-04 Thread Ertio Lew
Thanks Tyler !

I could not fully understand the reason why more no of column families
would mean more memory.. if you have under control parameters like
memtable_throughput  memtable_operations which are set per column
family basis then you can directly control  adjust by splitting the
memory space between two CFs in proportion to what you would do in
single CF.
Hence there should be no extra memory consumption for multiple CFs
that have been split from single one??

Regarding the compactions, I think even if they are more the size of
the SST files to be compacted is smaller as the data has been split
into two.
Then more compactions but smaller too!!


Then, provided the same amount of data, how can greater no of column
families could be a bad option(if you split the values of parameters
for memory consumption proportionately) ??

--
Regards,
Ertio





On Sat, Feb 5, 2011 at 10:43 AM, Tyler Hobbs ty...@datastax.com wrote:

 I read somewhere that more no of column families is not a good idea as
 it consumes more memory and more compactions to occur

 This is primarily true, but not in every case.

 But the caching requirements may be different as they cater to two
 different features.

 This is a great reason to *not* merge them.  Besides the key and row caches,
 don't forget about the OS buffer cache.

 Is it recommended to merge these two column families into one ?? Thoughts
 ?

 No, this sounds like an anti-pattern to me.  The overhead from having two
 separate CFs is not that high.

 --
 Tyler Hobbs
 Software Engineer, DataStax
 Maintainer of the pycassa Cassandra Python client library




Re: Merging the rows of two column families(with similar attributes) into one ??

2011-02-04 Thread Ertio Lew
Yes, a disadvantage of more no. of CF in terms of memory utilization
which I see is: -

if some CF is written less often as compared to other CFs, then the
memtable would consume space in the memory until it is flushed, this
memory space could have been much better used by a CF that's heavily
written and read. And if you try to make the thresholds for flush
smaller then more compactions would be needed.





On Sat, Feb 5, 2011 at 11:58 AM, Ertio Lew ertio...@gmail.com wrote:
 Thanks Tyler !

 I could not fully understand the reason why more no of column families
 would mean more memory.. if you have under control parameters like
 memtable_throughput  memtable_operations which are set per column
 family basis then you can directly control  adjust by splitting the
 memory space between two CFs in proportion to what you would do in
 single CF.
 Hence there should be no extra memory consumption for multiple CFs
 that have been split from single one??

 Regarding the compactions, I think even if they are more the size of
 the SST files to be compacted is smaller as the data has been split
 into two.
 Then more compactions but smaller too!!


 Then, provided the same amount of data, how can greater no of column
 families could be a bad option(if you split the values of parameters
 for memory consumption proportionately) ??

 --
 Regards,
 Ertio





 On Sat, Feb 5, 2011 at 10:43 AM, Tyler Hobbs ty...@datastax.com wrote:

 I read somewhere that more no of column families is not a good idea as
 it consumes more memory and more compactions to occur

 This is primarily true, but not in every case.

 But the caching requirements may be different as they cater to two
 different features.

 This is a great reason to *not* merge them.  Besides the key and row caches,
 don't forget about the OS buffer cache.

 Is it recommended to merge these two column families into one ?? Thoughts
 ?

 No, this sounds like an anti-pattern to me.  The overhead from having two
 separate CFs is not that high.

 --
 Tyler Hobbs
 Software Engineer, DataStax
 Maintainer of the pycassa Cassandra Python client library





Re: Can a same key exists for two rows in two different column families without clashing ?

2011-02-02 Thread Ertio Lew
Thanks Stephen for the Great Explanation!



On Wed, Feb 2, 2011 at 4:31 PM, Stephen Connolly 
stephen.alan.conno...@gmail.com wrote:

 On 2 February 2011 10:03, Ertio Lew ertio...@gmail.com wrote:
  Can a same key exists for two rows in two different column families
 without
  clashing ?  Other words, does the same algorithm needs to enforced for
  generating keys for different column families or can different
  algorithms(for generating keys) be enforced on column family basis?
 
  I have tried out that they can, but I wanted to know if there may be any
  problems associated with this.
 
  Thanks.
  Ertio Lew
 

 it is a bad analogy for many reasons but if you replace row key with
 primary key and column family with table then you might get an
 answer.

 a better analogy is to think of the following.

 public class Keyspace {

  public final MapString,MapString,byte[] columnFamily1;

  public final MapString,MapString,byte[] columnFamily2;

  public final MapString,MapString,MapString,byte[]
 superColumnFamily3;

 }

 (still not quite correct, but mostly so for our purposes);

 you are asking given

 Keyspace keyspace;
 String key1 = makeKeyAlg1();
 keyspace.columnFamily1.put(key1,...);

 String key2 = makeKeyAlg2();
 keyspace.columnFamily2.put(key2,...);

 when key1.equals(key2)

 then is there a problem?

 They are two separate maps... why would there be.

 -Stephen



Re: Is it recommended to store two types of data (not related to each other but need to be retrieved together) in one super column family ?

2011-01-29 Thread Ertio Lew
Could someone please point me in right direction by commenting on the above
ideas ?

On Fri, Jan 28, 2011 at 11:50 PM, Ertio Lew ertio...@gmail.com wrote:

 Hi,

 I have two kinds of data that I would like to fit in one super column
 family; I am trying this, for the reasons of implementing fast
 database retrievals by combining the data of two rows into just one
 row.

 First kind of data, in supercolumn family, is named with timeUUIDs as
 supercolumn names; Think of this as, the postIds of posts in a Group.
 These posts will need to be sorted by time (so that list of latest
 posts is retrieved). Thus each post has one supercolumn each with name
 as (timeUUID+userID) and sorted by timeUUIDtype.

 Second kind of data would be just a single supercolumn containing
 columns of userId of all members in a group(very small). (The no of
 members in group will be around 40-50 max). The name of this single
 supercolumn may be kept suitable(perhaps max. time in future ) so as
 to keep this supercolumn to the beginning.

 (The supercolumns are required as we need to store some additional
 data in the columns of 1st kind of data).

 So is it recommended to store these two types of data (not related to
 each other but need to be retrieved together) in one super column
 family ?



Is it recommended to store two types of data (not related to each other but need to be retrieved together) in one super column family ?

2011-01-28 Thread Ertio Lew
Hi,

I have two kinds of data that I would like to fit in one super column
family; I am trying this, for the reasons of implementing fast
database retrievals by combining the data of two rows into just one
row.

First kind of data, in supercolumn family, is named with timeUUIDs as
supercolumn names; Think of this as, the postIds of posts in a Group.
These posts will need to be sorted by time (so that list of latest
posts is retrieved). Thus each post has one supercolumn each with name
as (timeUUID+userID) and sorted by timeUUIDtype.

Second kind of data would be just a single supercolumn containing
columns of userId of all members in a group(very small). (The no of
members in group will be around 40-50 max). The name of this single
supercolumn may be kept suitable(perhaps max. time in future ) so as
to keep this supercolumn to the beginning.

(The supercolumns are required as we need to store some additional
data in the columns of 1st kind of data).

So is it recommended to store these two types of data (not related to
each other but need to be retrieved together) in one super column
family ?


Re: What is be the best possible client option available to a PHP developer for implementing an application ready for production environments ?

2011-01-18 Thread Ertio Lew
I think we might need to go with full Java implementation only, in
that case, to live up with Hector as we do not find any other better
option.

@Dave: Thanks for the links but we wouldn't much prefer to go with
thrift implementation because of frequently changing api and other
complexities there.

Also we would not like to lock ourselves with implementation in a
language with a client option that has limitations that we can bear
now but not necessarily in future.

If anybody else has a better solution to this please let me know.

Thank you all.
Ertio Lew


On Tue, Jan 18, 2011 at 2:49 PM, Dave Gardner dave.gard...@imagini.net wrote:
 I can't comment of phpcassa directly, but we use Cassandra plus PHP in
 production without any difficulties. We are happy with the
 performance.

 Most of the information we needed to get started we found here:

 https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP

 This includes details on how to compile the native PHP C Extension for
 Thrift. We use a bespoke client which wraps the Thrift interface.

 You may be better of with a higher level client, although when we were
 starting out there was less of a push away from Thrift directly. I
 found using Thrift useful as you gain an appreciation for what calls
 Cassandra actually supports. One potential advantage of using a higher
 level client is that it may protect you from the frequent Thrift
 interface changes which currently seem to accompany every major
 release.

 Dave




 On Tuesday, 18 January 2011, Tyler Hobbs ty...@riptano.com wrote:

 1. )  Is it devloped to the level in order to support all the
 necessary features to take full advantage of Cassandra?

 Yes.  There aren't some of the niceties of pycassa yet, but you can do 
 everything that Cassandra offers with it.


 2. )  Is it used in production by anyone ?

 Yes, I've talked to a few people at least who are using it in production.  
 It tends to play a limited role instead of a central one, though.


 3. )  What are its limitations?

 Being written in PHP.  Seriously.  The lack of universal 64bit integer 
 support can be problematic if you don't have a fully 64bit system.  PHP is 
 fairly slow.  PHP makes a few other things less easy to do.  If you're doing 
 some pretty lightweight interaction with Cassandra through PHP, these might 
 not be a problem for you.

 - Tyler



 --
 *Dave Gardner*
 Technical Architect

 [image: imagini_58mmX15mm.png]   [image: VisualDNA-Logo-small.png]

 *Imagini Europe Limited*
 7 Moor Street, London W1D 5NB

 [image: phone_icon.png] +44 20 7734 7033
 [image: skype_icon.png] daveg79
 [image: emailIcon.png] dave.gard...@imagini.net
 [image: icon-web.png] http://www.visualdna.com

 Imagini Europe Limited, Company number 5565112 (England
 and Wales), Registered address: c/o Bird  Bird,
 90 Fetter Lane, London, EC4A 1EQ, United Kingdom



Do you have a site in production environment with Cassandra? What client do you use?

2011-01-14 Thread Ertio Lew
Hey,

If you have a site in production environment or considering so, what
is the client that you use to interact with Cassandra. I know that
there are several clients available out there according to the
language you use but I would love to know what clients are being used
widely in production environments and are best to work with(support
most required features for performance).

Also preferably tell about the technology stack for your applications.

Any suggestions, comments appreciated ?

Thanks
Ertio


Re: Do you have a site in production environment with Cassandra? What client do you use?

2011-01-14 Thread Ertio Lew
what is the technology stack do you use?

On 1/14/11, Ran Tavory ran...@gmail.com wrote:
 I use Hector,  if that counts. ..
 On Jan 14, 2011 7:25 PM, Ertio Lew ertio...@gmail.com wrote:
 Hey,

 If you have a site in production environment or considering so, what
 is the client that you use to interact with Cassandra. I know that
 there are several clients available out there according to the
 language you use but I would love to know what clients are being used
 widely in production environments and are best to work with(support
 most required features for performance).

 Also preferably tell about the technology stack for your applications.

 Any suggestions, comments appreciated ?

 Thanks
 Ertio