Re: Disabling Swap for Cassandra

2020-04-16 Thread J. D. Jordan

Cassandra attempts to lock the heap at startup, but all the memory allocated 
after startup is not locked.  So you do want to make sure the allowed locked 
memory is large.

Disabling or vastly dialing down swappiness is a best practice for all server 
software, not just Cassandra, so you should still at the very least set the 
swappiness to some small number of you don’t want to completely disable it.

-Jeremiah

> On Apr 16, 2020, at 5:57 PM, Nitan Kainth  wrote:
> 
> Swap is controlled by OS and will use it when running short of memory. I 
> don’t think you can disable at Cassandra level
> 
> 
> Regards,
> Nitan
> Cell: 510 449 9629
> 
>>> On Apr 16, 2020, at 5:50 PM, Kunal  wrote:
>>> 
>> 
>> Hello,
>>  
>> I need some suggestion from you all. I am new to Cassandra and was reading 
>> Cassandra best practices. On one document, it was mentioned that Cassandra 
>> should not be using swap, it degrades the performance. 
>> My question is instead of disabling swap system wide, can we force Cassandra 
>> not to use swap? Some documentation suggests to use memory_locking_policy in 
>> cassandra.yaml. 
>> 
>> How do I check if our Cassandra already has this parameter and still uses 
>> swap ? Is there any way i can check this. I already checked cassandra.yaml 
>> and dont see this parameter. Is there any other place i can check and 
>> confirm?
>> 
>> Also, Can I set memlock parameter to unlimited (64kB default), so entire 
>> Heap (Xms = Xmx) can be locked at node startup ? Will that help?
>> 
>> Or if you have any other suggestions, please let me know. 
>>  
>>  
>> Regards,
>> Kunal
>>  


Re: GDPR, Right to Be Forgotten, and Cassandra

2018-02-09 Thread J. D. Jordan
The times I have run into similar requirements from legislation or standards 
the fact that SELECT no longer returns the data is enough for all auditors I 
have worked with.
Otherwise you get down into screwy requirements of needing to zero out all 
unused sectors on your disks to actually remove the data, and make sure nothing 
has the drive sectored cached somewhere, and other such things.

-Jeremiah

> On Feb 9, 2018, at 1:54 PM, Jon Haddad <j...@jonhaddad.com> wrote:
> 
> A layer violation?  Seriously?  Technical solutions exist to solve business 
> problems and I’m 100% fine with introducing former to solve the latter.
> 
> Look, if the goal is to purge information out of the DB as quickly as 
> possible from a lot of accounts, the fastest way to do it is to hijack the 
> fact that you’re constantly rewriting data through compaction and (ab)use it. 
>  It avoids the overhead of tombstones, and can be implemented in a way that 
> allows you to to perform a single write / edit a text file / some other 
> trivial system and immediately start removing customer data.  It’s an 
> incredibly efficient way of bulk removing customer data.  
> 
> The wording around "The Right To Be Forgotten” is a little vague [1], and I 
> don’t know if "the right to be forgotten entitles the data subject to have 
> the data controller erase his/her personal data” means that tombstones are 
> OK.  If you tombstone some row using TWCS, it will literally *never* be 
> deleted off disk, as opposed to using DeletingCompactionStrategy where it 
> could easily be removed without leaving data laying around in SSTables.  I’ve 
> done this already for this *exact* use case and know it works and works very 
> well.
> 
> The debate around what is the “correct” way to solve the problem is a 
> dogmatic one and I don’t have any interest in pursuing it any further.  I’ve 
> simply offered a solution that I know works because I’ve done it, which is 
> what the OP asked for.
> 
> [1] https://www.eugdpr.org/key-changes.html
> 
>> On Feb 9, 2018, at 10:33 AM, Dor Laor <d...@scylladb.com> wrote:
>> 
>> I think you're introducing a layer violation. GDPR is a business requirement 
>> and
>> compaction is an implementation detail. 
>> 
>> IMHO it's enough to delete the partition using regular CQL.
>> It's true that it won't be deleted immedietly but it will be eventually 
>> deleted (welcome to eventual consistency ;).
>> 
>> Even with user defined compaction, compaction may not be running instantly, 
>> repair will be required,
>> there are other nodes in the cluster, maybe partitioned nodes with the data. 
>> There is data in snapshots
>> and backups.
>> 
>> The business idea is to delete the data in a fast, reasonable time for 
>> humans and make it
>> first unreachable and later delete completely. 
>> 
>>> On Fri, Feb 9, 2018 at 8:51 AM, Jonathan Haddad <j...@jonhaddad.com> wrote:
>>> That might be fine for a one off but is totally impractical at scale or 
>>> when using TWCS. 
>>>> On Fri, Feb 9, 2018 at 8:39 AM DuyHai Doan <doanduy...@gmail.com> wrote:
>>>> Or use the new user-defined compaction option recently introduced, 
>>>> provided you can determine over which SSTables a partition is spread
>>>> 
>>>>> On Fri, Feb 9, 2018 at 5:23 PM, Jon Haddad <j...@jonhaddad.com> wrote:
>>>>> Give this a read through:
>>>>> 
>>>>> https://github.com/protectwise/cassandra-util/tree/master/deleting-compaction-strategy
>>>>> 
>>>>> Basically you write your own logic for how stuff gets forgotten, then you 
>>>>> can recompact every sstable with upgradesstables -a.  
>>>>> 
>>>>> Jon
>>>>> 
>>>>> 
>>>>>> On Feb 9, 2018, at 8:10 AM, Nicolas Guyomar <nicolas.guyo...@gmail.com> 
>>>>>> wrote:
>>>>>> 
>>>>>> Hi everyone,
>>>>>> 
>>>>>> Because of GDPR we really face the need to support “Right to Be 
>>>>>> Forgotten” requests => https://gdpr-info.eu/art-17-gdpr/  stating that 
>>>>>> "the controller shall have the obligation to erase personal data without 
>>>>>> undue delay"
>>>>>> 
>>>>>> Because I usually meet customers that do not have that much clients, 
>>>>>> modeling one partition per client is almost always possible, easing 
>>>>>> deletion by partition key.
>>>>>> 
>>>>>> Then, appart from triggering a manual compaction on impacted tables 
>>>>>> using STCS, I do not see how I can be GDPR compliant.
>>>>>> 
>>>>>> I'm kind of surprised not to find any thread on that matter on the ML, 
>>>>>> do you guys have any modeling strategy that would make it easier to get 
>>>>>> rid of data ? 
>>>>>> 
>>>>>> Thank you for any given advice
>>>>>> 
>>>>>> Nicolas
>>>>> 
>>>> 
>> 
> 


Re: Introducing Cassandra 3.7 LTS

2016-10-19 Thread Jason J. W. Williams
+1

On Wed, Oct 19, 2016 at 2:07 PM, sfesc...@gmail.com 
wrote:

> Wow, thank you for doing this. This sentiment regarding stability seems to
> be widespread. Is the team reconsidering the whole tick-tock cadence? If
> not, I would add my voice to those asking that it is revisited.
>
> Steve
>
> On Wed, Oct 19, 2016 at 1:00 PM Matija Gobec  wrote:
>
>> Hi Ben,
>>
>> Thanks for this awesome contribution. I'm eager to give it a try and test
>> it out.
>>
>> Best,
>> Matija
>>
>> On Wed, Oct 19, 2016 at 8:55 PM, Ben Bromhead 
>> wrote:
>>
>> Hi All
>>
>> I am proud to announce we are making available our production build of
>> Cassandra 3.7 that we run at Instaclustr (both for ourselves and our
>> customers). Our release of Cassandra 3.7 includes a number of backported
>> patches from later versions of Cassandra e.g. 3.8 and 3.9 but doesn't
>> include the new features of these releases.
>>
>> You can find our release of Cassandra 3.7 LTS on github here (
>> https://github.com/instaclustr/cassandra). You can read more of our
>> thinking and how this applies to our managed service here (
>> https://www.instaclustr.com/blog/2016/10/19/patched-cassandra-3-7/).
>>
>> We also have an expanded FAQ about why and how we are approaching 3.x in
>> this manner (https://github.com/instaclustr/cassandra#cassandra-37-lts),
>> however I've included the top few question and answers below:
>>
>> *Is this a fork?*
>> No, This is just Cassandra with a different release cadence for those who
>> want 3.x features but are slightly more risk averse than the current
>> schedule allows.
>>
>> *Why not just use the official release?*
>> With the 3.x tick-tock branch we have encountered more instability than
>> with the previous release cadence. We feel that releasing new features
>> every other release makes it very hard for operators to stabilize their
>> production environment without bringing in brand new features that are not
>> battle tested. With the release of Cassandra 3.8 and 3.9 simultaneously the
>> bug fix branch included new and real-world untested features, specifically
>> CDC. We have decided to stick with Cassandra 3.7 and instead backport
>> critical issues and maintain it ourselves rather than trying to stick with
>> the current Apache Cassandra release cadence.
>>
>> *Why backport?*
>> At Instaclustr we support and run a number of different versions of
>> Apache Cassandra on behalf of our customers. Over the course of managing
>> Cassandra for our customers we often encounter bugs. There are existing
>> patches for some of them, others we patch ourselves. Generally, if we can,
>> we try to wait for the next official Apache Cassandra release, however in
>> the need to ensure our customers remain stable and running we will
>> sometimes backport bugs and write our own hotfixes (which are also
>> submitted back to the community).
>>
>> *Why release it?*
>> A number of our customers and people in the community have asked if we
>> would make this available, which we are more than happy to do so. This
>> repository represents what Instaclustr runs in production for Cassandra 3.7
>> and this is our way of helping the community get a similar level of
>> stability as what you would get from our managed service.
>>
>> Cheers
>>
>> Ben
>>
>>
>>
>> --
>> Ben Bromhead
>> CTO | Instaclustr 
>> +1 650 284 9692
>> Managed Cassandra / Spark on AWS, Azure and Softlayer
>>
>>
>>


Re: Adding Materialized View triggers "Mutation Too Large" error.

2016-08-08 Thread Jason J. W. Williams
Hi Paulo,

Thanks for bug link! I've subscribed to it for updates. Thankfully, these
tables for something still in beta so we can wipe the data. But it'll be
great when its fixed.

-J

On Mon, Aug 8, 2016 at 4:26 PM, Paulo Motta <pauloricard...@gmail.com>
wrote:

> What happens is that when trying to rebuild the MV, the rebuilder tries to
> create a very large batch that exceeds commitlog_segment_size_in_mb. This
> limitation is currently being addressed on CASSANDRA-11670. Two options I
> can see to workaround this for now:
> 1) increase commitlog_segment_size_in_mb as you suggested
> 2) Re-create the base table and views and reinsert all data again to the
> base able.
>
> 2016-08-08 19:52 GMT-03:00 Jason J. W. Williams <jasonjwwilli...@gmail.com
> >:
>
>> HI Guys,
>>
>> We're running Cassandra 3.0.8, and needed to add a field to a table and
>> it's materialized views. We dropped the MVs, added the fields to the
>> underlying table, and recreated MVs with the new fields. However, the MV
>> creation is failing with:
>>
>>
>> WARN  [CompactionExecutor:6933] 2016-08-08 22:48:11,075
>> ViewBuilder.java:178 - Materialized View failed to complete, sleeping 5
>> minutes before restarting
>> java.lang.IllegalArgumentException: Mutation of 17709054 bytes is too
>> large for the maximum size of 16777216
>>
>> https://gist.github.com/williamsjj/49e4c6a0b3324343205386d040560e1e
>>
>> I'm considering bumping commitlog_segment_size_in_mb up to 64MB to get
>> around this, but it seems like the MV code should be smart enough to stay
>> under the limit.
>>
>> Any advice is greatly appreciated.
>>
>> -J
>>
>
>


Adding Materialized View triggers "Mutation Too Large" error.

2016-08-08 Thread Jason J. W. Williams
HI Guys,

We're running Cassandra 3.0.8, and needed to add a field to a table and
it's materialized views. We dropped the MVs, added the fields to the
underlying table, and recreated MVs with the new fields. However, the MV
creation is failing with:


WARN  [CompactionExecutor:6933] 2016-08-08 22:48:11,075
ViewBuilder.java:178 - Materialized View failed to complete, sleeping 5
minutes before restarting
java.lang.IllegalArgumentException: Mutation of 17709054 bytes is too large
for the maximum size of 16777216

https://gist.github.com/williamsjj/49e4c6a0b3324343205386d040560e1e

I'm considering bumping commitlog_segment_size_in_mb up to 64MB to get
around this, but it seems like the MV code should be smart enough to stay
under the limit.

Any advice is greatly appreciated.

-J


Re: DTCS SSTable count issue

2016-07-11 Thread Jason J. W. Williams
I can vouch for TWCS...we switched from DTCS to TWCS using Jeff's plugin w/
Cassandra 3.0.5 and just upgraded to 3.0.8 today and switched over to the
built-in version of TWCS.

-J

On Mon, Jul 11, 2016 at 1:38 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
wrote:

> DTCS is deprecated in favor of TWCS in new versions, yes.
>
>
>
> Worth mentioning that you can NOT disable blocking read repair which comes
> naturally if you use CL > ONE.
>
>
>
> >  Also instead of major compactions (which comes with its set of issues
> / tradeoffs too) you can think of a script smartly using sstablemetadata to
> find the sstables holding too much tombstones and running single SSTable
> compactions on them through JMX and user defined compactions. Meanwhile if
> you want to do it manually, you could do it with something like this to
> know the tombstone ratio from the biggest sstable:
>
>
>
> The tombstone compaction options basically do this for you for the right
> settings (unchecked tombstone compaction = true, set threshold to 85% or
> so, don’t try to get clever and set it to something very close to 99%, the
> estimated tombstone ratio isn’t that accurate)
>
>
>
> -  Jeff
>
>
>
>
>
> *From: *Alain RODRIGUEZ <arodr...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Monday, July 11, 2016 at 1:05 PM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: DTCS SSTable count issue
>
>
>
> @Jeff
>
>
>
> Rather than being an alternative, isn't your compaction strategy going to
> deprecate (and finally replace) DTCS ? That was my understanding from the
> ticket CASSANDRA-9666.
>
> @Riccardo
>
>
>
> If you are interested in TWCS from Jeff, I believe it has been introduced
> in 3.0.8 actually, not 3.0.7
> https://github.com/apache/cassandra/blob/cassandra-3.0/CHANGES.txt#L28
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_cassandra_blob_cassandra-2D3.0_CHANGES.txt-23L28=CwMFaQ=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow=YH_8oul7dFVkpBLW_2oTDIMju6au0aZNERq2is-d7Ug=AqctrVapUKAr-AuBiB520RaDRjkh0YQcR-Ze4CPQWIw=>.
> Anyway, you can use it in any recent version as compactions strategies are
> pluggable.
>
>
>
> What concerns me is that I have an high tombstone read count despite those
> are insert only tables. Compacting the table make the tombstone issue
> disappear. Yes, we are using TTL to expire data after 3 months and I have
> not touch the GC grace period.
>
>
>
> I observed the same issue recently and I am confident that TWCS will solve
> this tombstone issue, but it is not tested on my side so far. Meanwhile, be
> sure you have disabled any "read repair" on tables using DTCS and maybe
> hints as well. It is a hard decision to take as you'll loose 2 out of 3
> anti entropy systems, but DTCS behaves badly with those options turned on
> (TWCS is fine with it). The last anti-entropy being a full repair that you
> might already not be running as you only do inserts...
>
>
>
> Also instead of major compactions (which comes with its set of issues /
> tradeoffs too) you can think of a script smartly using sstablemetadata to
> find the sstables holding too much tombstones and running single SSTable
> compactions on them through JMX and user defined compactions. Meanwhile if
> you want to do it manually, you could do it with something like this to
> know the tombstone ratio from the biggest sstable:
>
> du -sh /path_to_a_table/* | sort -h | tail -20 | awk "{print $1}" && du
> -sh /path_to_a_table/* | sort -h | tail -20 | awk "{print $2}" | xargs
> sstablemetadata | grep tombstones
>
> And something like this to run a user defined compaction on the ones you
> chose (big sstable with high tombstone ratio):
>
> echo "run -b org.apache.cassandra.db:type=CompactionManager
> forceUserDefinedCompaction " | java -jar
> jmxterm-version.jar -l :
>
> *note:* you have to download jmxterm (or use any other jmx tool).
>
>
>
> Did you give a try to the unchecked_tombstone_compaction as well
> (compaction options at the table level)? Feel free to set this one to true.
> I think it could be the default. It is safe as long as your machines have
> some more resources available (not that much). That's the first thing I
> would do.
>
>
>
> Also if you use TTL only, feel free to reduce the gc_grace_seconds, this
> will probably help having tombstones removed. I would start with other
> solutions first. Keep in mind that if someday you perform deletes, this
> setti

Adding column to materialized view

2016-06-27 Thread Jason J. W. Williams
Hey Guys,

Running Cassandra 3.0.5. Needed to add a column to a materialized view, but
ALTER MATERIALIZED VIEW doesn't seem to allow that. So we ended up dropping
the view and recreating it. Is that expected or did I miss something in the
docs?

-J


Re: Understanding when Cassandra drops expired time series data

2016-06-17 Thread Jason J. W. Williams
Hey Jeff,

Do most of those behaviors apply to TWCS too?

-J

On Fri, Jun 17, 2016 at 1:25 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
wrote:

> First, DTCS in 2.0.15 has some weird behaviors -
> https://issues.apache.org/jira/browse/CASSANDRA-9572 .
>
>
>
> That said, some other general notes:
>
>
> Data deleted by TTL isn’t the same as issuing a delete – each expiring
> cell internally has a ttl/timestamp at which it will be converted into a
> tombstone. There is no tombstone added to the memtable, or flushed to disk
> – it just treats the expired cells as tombstones once they’re past that
> timestamp.
>
>
>
>
> Cassandra’s getFullyExpiredSSTables() will consider a table fully expired
> if (and only if) all cells within that table are expired (current time >
> max timestamp ) AND the sstable timestamps don’t overlap with others that
> aren’t fully expired. Björn talks about this in
> https://issues.apache.org/jira/browse/CASSANDRA-8243 - the intent here is
> so that explicit deletes (which do create tombstones) won’t be GC’d  from
> an otherwise fully expired sstable if they’re covering data in a more
> recent sstable – without this check, we could accidentally bring dead data
> back to life. In an append only time series workload this would be unusual,
> but not impossible.
>
> Unfortunately, read repairs (foreground/blocking, if you write with CL <
> ALL and read with CL > ONE) will cause cells written with old timestamps to
> be written into the newly flushed sstables, which creates sstables with
> wide gaps between minTimestamp and maxTimestamp (you could have a read
> repair pull data that is 23 hours old into a new sstable, and now that one
> sstable spans 23 hours, and isn’t fully expired until the oldest data is 47
> hours old). There’s an open ticket (
> https://issues.apache.org/jira/browse/CASSANDRA-10496 ) meant to make
> this behavior ‘better’ in the future by splitting those old read-repaired
> cells from the newly flushed sstables.
>
>
>
>
> I gave a talk on a lot of this behavior last year at Summit (
> http://www.slideshare.net/JeffJirsa1/cassandra-summit-2015-real-world-dtcs-for-operators
> ) - if you’re running time series in production on DTCS, it’s worth a
> glance.
>
>
>
>
>
>
>
> *From: *jerome <jeromefroel...@hotmail.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Friday, June 17, 2016 at 11:52 AM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Understanding when Cassandra drops expired time series data
>
>
>
> Hello! Recently I have been trying to familiarize myself with Cassandra
> but don't quite understand when data is removed from disk after it has been
> deleted. The use case I'm particularly interested is expiring time series
> data with DTCS. As an example, I created the following table:
>
> CREATE TABLE metrics (
>
>   metric_id text,
>
>   time timestamp,
>
>   value double,
>
>   PRIMARY KEY (metric_id, time),
>
> ) WITH CLUSTERING ORDER BY (time DESC) AND
>
>  default_time_to_live = 86400 AND
>
>  gc_grace_seconds = 3600 AND
>
>  compaction = {
>
>   'class': 'DateTieredCompactionStrategy',
>
>   'timestamp_resolution':'MICROSECONDS',
>
>   'base_time_seconds':'3600',
>
>   'max_sstable_age_days':'365',
>
>   'min_threshold':'4'
>
>  };
>
> I understand that Cassandra will create a tombstone for all rows inserted
> into this table 24 hours after they are inserted (86400 seconds). These
> tombstones will first be written to an in-memory Memtable and then flushed
> to disk as an SSTable when the Memtable reaches a certain size. My question
> is when will the data that is now expired be removed from disk? Is it the
> next time the SSTable which contains the data gets compacted? So, with DTCS
> and min_threshold set to four, we would wait until at least three other
> SSTables are in the same time window as the expired data, and then those
> SSTables will be compacted into a SSTable without the expired data. Is it
> only during this compaction that the data will be removed? It seems to me
> that this would require Cassandra to maintain some metadata on which rows
> have been deleted since the newer tombstones would likely not be in the
> older SSTables that are being compacted. Also, I'm aware that Cassandra can
> drop entire SSTables if they contain only expired data but I'm unsure of
> what qualifies as expired data (is it just SSTables whose maximum
> timestamp is past the default TTL for the table?) and when such SSTables
> are dropped.
>
> Alternatively, do the SSTables wh

Re: Multi-DC Cluster w/ non-replicated Keyspace

2016-06-16 Thread Jason J. W. Williams
It was my mistake. Before I checked the trace I assume it meant the data
also had been replicated to the remote cluster, which is why it could
answer the request. Thank you for responding so quickly and helping correct
my misunderstanding. As long as the data isn't being replicated, everything
else is fine.

-J

On Thu, Jun 16, 2016 at 8:05 PM, Ben Slater <ben.sla...@instaclustr.com>
wrote:

> That’s the behaviour I would have expected. I’m not aware of anyway to
> prevent this and would be surprised if there is one (but I’ve never tried
> to find one either so it might be possible).
>
> Cheers
> Ben
>
> On Fri, 17 Jun 2016 at 12:02 Jason J. W. Williams <
> jasonjwwilli...@gmail.com> wrote:
>
>> Hey Ben,
>>
>> Looks like just the schema. I was surprised that running SELECTs against
>> the DC which should not have any data (because it's not specified in
>> NetworkTopologyStrategy), actually returned data. But looking at the query
>> trace it looks like its forwarding the queries to the other DC.
>>
>> -J
>>
>> On Thu, Jun 16, 2016 at 7:55 PM, Ben Slater <ben.sla...@instaclustr.com>
>> wrote:
>>
>>> Do you mean the data is getting replicated or just the schema?
>>>
>>> On Fri, 17 Jun 2016 at 11:48 Jason J. W. Williams <
>>> jasonjwwilli...@gmail.com> wrote:
>>>
>>>> Hi Guys,
>>>>
>>>> We have a 2 DC cluster where the keyspaces are replicated between the
>>>> 2. Is it possible to add a keyspace to one of the DCs that won't be
>>>> replicated to the other?
>>>>
>>>> Whenever we add a new keyspace it seems to get replicated even if we
>>>> don't specify the other DC in the keyspace's NetworkTopologyStrategy.
>>>>
>>>> -J
>>>>
>>> --
>>> 
>>> Ben Slater
>>> Chief Product Officer
>>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>>> +61 437 929 798
>>>
>>
>> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798
>


Re: Multi-DC Cluster w/ non-replicated Keyspace

2016-06-16 Thread Jason J. W. Williams
Hey Ben,

Looks like just the schema. I was surprised that running SELECTs against
the DC which should not have any data (because it's not specified in
NetworkTopologyStrategy), actually returned data. But looking at the query
trace it looks like its forwarding the queries to the other DC.

-J

On Thu, Jun 16, 2016 at 7:55 PM, Ben Slater <ben.sla...@instaclustr.com>
wrote:

> Do you mean the data is getting replicated or just the schema?
>
> On Fri, 17 Jun 2016 at 11:48 Jason J. W. Williams <
> jasonjwwilli...@gmail.com> wrote:
>
>> Hi Guys,
>>
>> We have a 2 DC cluster where the keyspaces are replicated between the 2.
>> Is it possible to add a keyspace to one of the DCs that won't be replicated
>> to the other?
>>
>> Whenever we add a new keyspace it seems to get replicated even if we
>> don't specify the other DC in the keyspace's NetworkTopologyStrategy.
>>
>> -J
>>
> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798
>


Multi-DC Cluster w/ non-replicated Keyspace

2016-06-16 Thread Jason J. W. Williams
Hi Guys,

We have a 2 DC cluster where the keyspaces are replicated between the 2. Is
it possible to add a keyspace to one of the DCs that won't be replicated to
the other?

Whenever we add a new keyspace it seems to get replicated even if we don't
specify the other DC in the keyspace's NetworkTopologyStrategy.

-J


Re: Most stable version?

2016-04-22 Thread Jason J. W. Williams
Thanks for the advice Carlos. Do appreciate it.

-J

On Fri, Apr 22, 2016 at 1:23 PM, Carlos Rolo <r...@pythian.com> wrote:

> I do expect 3 to get stable at some point, according to documentation it
> will be the 3.0.x series. But the current 3.x tick-tock,  I would recommend
> a jump into it when Datastax do it. Otherwise, maybe 4 might get stable and
> we could be following similar releases cicles like some software out there,
> even is stable (2 and 4) even is unstable (3 and 5). But this is my
> guessing. Wait for a DSE release on 3.x and use that.
>
> I had problems in earlier 2.2, 2.2.5 seems to be a solid release, but I
> will wait for 2.2.6 before recommending for production. Just to be safe :)
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant / Datastax Certified Architect / Cassandra MVP
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
> <http://linkedin.com/in/carlosjuzarterolo>*
> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
> www.pythian.com
>
> On Fri, Apr 22, 2016 at 6:42 PM, Jason Williams <jasonjwwilli...@gmail.com
> > wrote:
>
>> Hi Carlos,
>>
>> I read your blog post (actually almost everything I can find on tick
>> tock). My understanding has been tick tock will be the only versioning
>> going forward.
>>
>> Or are you suggesting at some point there will be a stable train for 3?
>> (or that 3.x will be bumped to 4.0 when stable)?
>>
>> We're on 2.2.5 and haven't seen any major problems with it.
>>
>> -J
>>
>>
>>
>> Sent via iPhone
>>
>> On Apr 22, 2016, at 03:34, Carlos Rolo <r...@pythian.com> wrote:
>>
>> If you need SASI, you need to use 3.4+. 3.x will always be "unstable" (It
>> is explained why in my blog post). You get those odd versions, but it is
>> not a solid effort to stabilize the platform, otherwise devs would not jump
>> to 3.6, and keep working on 3.5. And then you get 3.7, which might fix some
>> issues of 3.4+, but next month you get 3.8 unstable again... I'm waiting to
>> see where this is going. I only had bad experiences with 3.x series atm.
>>
>> If you want stability (and no new features), you would use 2.1.13.
>>
>> 2.2.x is kind of a mixed bag, no really huge improvements over 2.1.x
>> series and it is still having some issues, so I would stick to 2.1.x
>> series.
>>
>> Regards,
>>
>> Carlos Juzarte Rolo
>> Cassandra Consultant / Datastax Certified Architect / Cassandra MVP
>>
>> Pythian - Love your data
>>
>> rolo@pythian | Twitter: @cjrolo | Linkedin: 
>> *linkedin.com/in/carlosjuzarterolo
>> <http://linkedin.com/in/carlosjuzarterolo>*
>> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
>> www.pythian.com
>>
>> On Fri, Apr 22, 2016 at 10:16 AM, Jason Williams <
>> jasonjwwilli...@gmail.com> wrote:
>>
>>> My reading of the tick-rock cycle, is that we've moved from a stable
>>> train that receives mostly bug fixes until the next major stable, to one
>>> where every odd minor version is a bug fix-only...likely mostly for the
>>> previous even. The goal being a relatively continuously stable code base in
>>> odd minor versions.
>>>
>>> In that environment where there is no "stable" train, would the right
>>> approach be to pick the feature set needed and then choose the odd minor
>>> where that feature set had been stable for 2-3 previous odd minors.
>>>
>>> For example, SASI was added in 3.4, so 3.5 is the first bug fix only
>>> (odd minor) containing it. By the logic above you wouldn't want to use SASI
>>> in production until 3.9 or later. Or is my logic about how to treat
>>> tick-tock off base?
>>>
>>> -J
>>>
>>>
>>> Sent via iPhone
>>>
>>> On Apr 22, 2016, at 01:46, Satoshi Hikida <sahik...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I'm also looking for the most stable version of the Cassandra, too. I
>>> read Carlos's blog post. According to his article, I guess 2.1.x is the
>>> most stable version, is it right? I prefer to use the most stable version
>>> rather than many advanced features. For satisfy my purpose, should I use
>>> 2.1.X? or latest 2.2.x is recommended?
>>>
>>> Currently I use 2.2.5, but is the latest 2.1.13 recommended for
>>> production use?
>>>
>>> Regards,
>>> Satoshi
>>>
>>>
>>> On Mon, Apr 

Re: Optional TLS CQL Encryption

2016-04-20 Thread Jason J. W. Williams
Hi Ben,

Thanks for confirming what I saw occur. The Datastax drivers don't play
very nicely with Twisted Python so connection pooling is inconsistent and
makes always-on TLS a no-go performance-wise. The encryption overhead isn't
the problem, it's the build-up of the TLS session for every connection when
connection pooling is not working as needed. That said it is still
beneficial to be able to enforce TLS for remote access...MySQL allows you
to enforce TLS on a per-user basis for example.

If someone has been successful not wrapping the Datastax drivers in
deferToThread calls when using Twisted I'd appreciate insight on how you
got that working because its pretty much undocumented.

-J

On Tue, Apr 19, 2016 at 11:46 PM, Ben Bromhead <b...@instaclustr.com> wrote:

> Hi Jason
>
> If you enable encryption it will be always on. Optional encryption is
> generally a bad idea (tm). Also always creating a new session every query
> is also a bad idea (tm) even without the minimal overhead of encryption.
>
> If you are really hell bent on doing this you could have a node that is
> part of the cluster but has -Dcassandra.join_ring=false set in jvm
> options in cassandra-env.sh so it does not get any data and configure
> that to have no encryption enabled. This is known as a fat client. Then
> connect to that specific node whenever you want to do terrible non
> encrypted things.
>
> Having said all that, please don't do this.
>
> Cheers
>
> On Tue, 19 Apr 2016 at 15:32 Jason J. W. Williams <
> jasonjwwilli...@gmail.com> wrote:
>
>> Hey Guys,
>>
>> Is there a way to make TLS encryption optional for the CQL listener? We'd
>> like to be able to use for remote management connections but not for same
>> datacenter usage (since the build/up  tear down cost is too high for things
>> that don't use pools).
>>
>> Right now it appears if we enable encryption it requires it for all
>> connections, which definitely is not what we want.
>>
>> -J
>>
> --
> Ben Bromhead
> CTO | Instaclustr <https://www.instaclustr.com/>
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>


Optional TLS CQL Encryption

2016-04-19 Thread Jason J. W. Williams
Hey Guys,

Is there a way to make TLS encryption optional for the CQL listener? We'd
like to be able to use for remote management connections but not for same
datacenter usage (since the build/up  tear down cost is too high for things
that don't use pools).

Right now it appears if we enable encryption it requires it for all
connections, which definitely is not what we want.

-J


Re: Cassandra security using openssl or keytool

2015-10-29 Thread Jason J. W. Williams
>
> Google words like :
>
> "
> import openssl private key into keytool
> "
>
> Find results like :
>
>
> http://stackoverflow.com/questions/906402/importing-an-existing-x509-certificate-and-private-key-in-java-keystore-to-use-i/8224863#8224863
>
>
I wasted 4-5 hours of my life recently importing an OpenSSL key in a PEM
into a Cassandra keystore using exactly that article as a starting point
(the server's hostname already had a certificate and key in our ops CA, and
for various reasons we didn't want to revoke and reissue it.).

Even when you get the key imported, keytool will then frequently refuse to
pair that key entry with the certificate when you import the
certificate...and it will instead store the certificate in a new keystore
entry. Which won't work because the alias names on the keystore entries for
the key and certificate will be different (you need one entry storing both
key and certificate).  I did _finally_ get it to work but I can't tell you
how I did it...it was a lot of manually editing PEM files, converting them
to DERs and then trying every possible combination of keytool import flags.

-J


Re: Cassandra security using openssl or keytool

2015-10-29 Thread Jason J. W. Williams
>
> I certainly don't vouch for the advisability of attempting a task you've
> described as a "real pain" ... but if OP wants/needs to, it's their
> funeral? :D
>

Agreed. I just wanted to elaborate what a "real pain" meant so OP would
know I wasn't just blowing him off.

-J


[ANNOUNCE] YCSB 0.4.0 Release

2015-10-22 Thread Robert J. Moore
On behalf of the development community, I am pleased to announce the 
release of YCSB 0.4.0.


Highlights:

* Default measurement changed from histogram to hdrhistogram.
* Users who want previous behavior can set the 'measurementtype' 
property to 'histogram'.
* Reported 95th and 99th percentile latencies now in microseconds 
(previously in milliseconds).
* The HBase Binding has been split into 3 seperate bindings based on 
your version of HBase and

the names have changed to hbase10, hbase098, and hbase094.

Bug Fixes:

* Previously, with hdrhistogram, the 95th percentile actually reported 
the 90th percentile value. It now reports the actual 95th percentile value.

* Fixed a race condition between insert and read/update operations.


Full release notes, including links to source and convenience binaries:

https://github.com/brianfrankcooper/YCSB/releases/tag/0.4.0

This release covers changes from the last 2 months.

--
Rob


Re: Verifying internode SSL

2015-10-13 Thread Jason J. W. Williams
Awesome. Thanks Nate!

On Tue, Oct 13, 2015 at 10:32 AM, Nate McCall 
wrote:

> > I've configured internode SSL and set it to be used between datacenters
> only. Is there a way in the logs to verify SSL is operating between nodes
> in different DCs or do I need to break out tcpdump?
> >
>
> Even on DC only encryption, you should see the following message in the
> log:
>
> "Starting Encrypted Messaging Service on SSL port 7001"
>
> With any Java-based thing using SSL, you can always use the following
> startup parameter to find out exactly what is going in:
>
> -Djavax.net.debug=ssl
>
> This page will tell you how to interpret the debug output:
>
> http://docs.oracle.com/javase/7/docs/technotes/guides/security/jsse/ReadDebug.html
>
> --
> -
> Nate McCall
> Austin, TX
> @zznate
>
> Co-Founder & Sr. Technical Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>


Re: Unbalanced disk load

2015-07-18 Thread J. Ryan Earl
Even with https://issues.apache.org/jira/browse/CASSANDRA-7386 data
balancing across JBOD setups is pretty horrible.  Having used JBOD for
about 2 years from 1.2.x and up, it is my opinion JBOD on Cassandra is
nascent at best and far from mature.  For a variety of reasons, JBOD should
perform better if IO and data is balanced across multiple devices due to
things like linux device queues, striping overhead, access contention, and
so forth.  However, data and access patterns simply are not balanced in
Cassandra JBOD setups.

Here's an example of what we see on one of our nodes:

/dev/sdd1 1.1T  202G  915G  19% /data/2
/dev/sde1 1.1T  136G  982G  13% /data/3
/dev/sdi1 1.1T  217G  901G  20% /data/7
/dev/sdc1 1.1T  402G  715G  36% /data/1
/dev/sdh1 1.1T  187G  931G  17% /data/6
/dev/sdf1 1.1T  201G  917G  18% /data/4
/dev/sdg1 1.1T  154G  963G  14% /data/5

Essentially, for a storage engine to make good use of JBOD, like HDFS or
Ceph does, the storage engines need to essentially be designed from the
ground up to use JBOD.  In Cassandra, a single sstable cannot be split at
the storage engine level to be split across members of the JBOD.  In our
case, we have a single sstable file that is bigger than all the data files
combined on other disks.  Looking at the disk using 402G, we see:

274G vcells_polished_data-vcells_polished-jb-38985-Data.db

A single sstable is using 274G.  In addition to data usage imbalance, we
see hot spots as well.  With static fields in particular, and CFs that
don't change much, you'll get CFs that end up compacting into fewer number
of large sstables.  With most of the data for a CF being in one sstable and
on one data volume, a single data volume then becomes a hotspot for reads
on that CF.  Cassandra tries to minimize the number of sstables a row will
be written across, but in particular after some compaction on an CFs that
are rarely updated, most of the data for a CF can end up in a single
sstable, and stables aren't split aross data volumes.  Thus a single volume
will be a hot-spot for access to that CF in a JBOD setup as Cassandra does
not effectively distribute data across individual volumes in all
circumstances.

There may be tuning which would help this, but it's specific to JBOD and
not somewhat that you would have to worry about in a single data volume
setup, ie RAID0.  With a RAID0, the downside of course, is that losing a
single member disk to the RAID0 takes the node down.  The upside is you
don't have to worry about the imbalance of both I/O and data footprint
across individual volumes.

Unlike HDFS, Ceph, and RAID for that matter, where you're dealing with
maximum fixed sizes blocks/stripes that are then distributed at a granular
level across the JBOD volumes, Cassandra is dealing with uncapped, low
granularity, variable sized sstable data files which it attempts to
distribute across JBOD volumes making JBOD far from ideal.  Frankly, it's
hard for me to imagine any columnar data store doing JBOD well.

On Fri, Jul 17, 2015 at 4:08 PM, Soerian Lieve sli...@liveramp.com wrote:

 Hi,

 I am currently benchmarking Cassandra with three machines, and on each
 machine I am seeing an unbalanced distribution of data among the data
 directories (1 per disk).
 I am concerned that this affects my write performance, is there anything
 that I can make the distribution be more even? Would raid0 be my best
 option?

 Details:
 3 machines, each have 24 cores, 64GB of RAM, 7 SSDs of 500GB each.
 Commitlog is on a separate disk, cassandra.yaml configured according to
 Datastax' guide on cassandra.yaml.
 Total size of data is about 2TB, 14B records, all unique. Replication
 factor of 1.

 Thanks,
 Soerian



Re: error='Cannot allocate memory' (errno=12)

2015-05-12 Thread J. Ryan Earl
What's your ulimit -a output?  Did you adjust nproc and nofile ulimits up?
Do you have JNA installed?  What about memlock ulimit and in sysctl.conf =
kernel.shmmax?

What's in cassandra.log?

On Mon, May 11, 2015 at 7:24 AM, Rahul Bhardwaj 
rahul.bhard...@indiamart.com wrote:

 Hi Robert,

 I saw somewhere you answering the same prob but no solution found. plz
 check again.


 regards:
 Rahul Bhardwaj




 On Mon, May 11, 2015 at 5:49 PM, Rahul Bhardwaj 
 rahul.bhard...@indiamart.com wrote:

 Bu it is giving same error with java 7 and open jdk

 On Mon, May 11, 2015 at 5:26 PM, Anishek Agarwal anis...@gmail.com
 wrote:

 Well i havent used 2.1.x cassandra neither java 8 but any reason for not
 using oracle JDK as i thought thats what is recommended. i saw a thread
 earlier stating java 8 with 2.0.14+ cassandra is tested but not sure about
 2.1.x versions.


 On Mon, May 11, 2015 at 4:04 PM, Rahul Bhardwaj 
 rahul.bhard...@indiamart.com wrote:

 PFA of error log​
  hs_err_pid9656.log
 https://docs.google.com/a/indiamart.com/file/d/0B0hlSlesIPVfaU9peGwxSXdsZGc/edit?usp=drive_web
 ​

 On Mon, May 11, 2015 at 3:58 PM, Rahul Bhardwaj 
 rahul.bhard...@indiamart.com wrote:

 free RAM:


 free -m
  total   used   free sharedbuffers
 cached
 Mem: 64398  23753  40644  0108
 8324
 -/+ buffers/cache:  15319  49078
 Swap: 2925 15   2909


  ulimit -a
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 515041
 max locked memory   (kbytes, -l) 64
 max memory size (kbytes, -m) unlimited
 open files  (-n) 1024
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515041
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited


 Also attaching complete error file


 On Mon, May 11, 2015 at 3:35 PM, Anishek Agarwal anis...@gmail.com
 wrote:

 the memory cassandra is trying to allocate is pretty small. you sure
 there is no hardware failure on the machine. what is the free ram on the
 box ?

 On Mon, May 11, 2015 at 3:28 PM, Rahul Bhardwaj 
 rahul.bhard...@indiamart.com wrote:

 Hi All,

 We have cluster of 3 nodes with  64GB RAM each. My cluster was
 running in healthy state. Suddenly one machine's cassandra daemon stops
 working and shut down.

 On restarting it after 2 minutes it again stops and is getting stop
 after returning below error in cassandra.log

 Java HotSpot(TM) 64-Bit Server VM warning: INFO:
 os::commit_memory(0x7fd064dc6000, 12288, 0) failed; error='Cannot
 allocate memory' (errno=12)
 #
 # There is insufficient memory for the Java Runtime Environment to
 continue.
 # Native memory allocation (malloc) failed to allocate 12288 bytes
 for committing reserved memory.
 # An error report file with more information is saved as:
 # /tmp/hs_err_pid23215.log
 INFO  09:50:41 Loading settings from
 file:/etc/cassandra/default.conf/cassandra.yaml
 INFO  09:50:41 Node
 configuration:[authenticator=AllowAllAuthenticator;
 authorizer=AllowAllAuthorizer; auto_snapshot=true;
 batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024;
 cas_contention_timeout_in_ms=1000; client_encryption_options=REDACTED;
 cluster_name=Test Cluster; column_index_size_in_kb=64;
 commit_failure_policy=stop;
 commitlog_directory=/var/lib/cassandra/commitlog;
 commitlog_segment_size_in_mb=64; commitlog_sync=periodic;
 commitlog_sync_period_in_ms=1; compaction_throughput_mb_per_sec=16;
 concurrent_compactors=4; concurrent_counter_writes=32; 
 concurrent_reads=32;
 concurrent_writes=32; counter_cache_save_period=7200;
 counter_cache_size_in_mb=null; counter_write_request_timeout_in_ms=5000;
 cross_node_timeout=false; 
 data_file_directories=[/var/lib/cassandra/data];
 disk_failure_policy=stop; dynamic_snitch_badness_threshold=0.1;
 dynamic_snitch_reset_interval_in_ms=60;
 dynamic_snitch_update_interval_in_ms=100;
 endpoint_snitch=GossipingPropertyFileSnitch; 
 hinted_handoff_enabled=true;
 hinted_handoff_throttle_in_kb=1024; incremental_backups=false;
 index_summary_capacity_in_mb=null;
 index_summary_resize_interval_in_minutes=60; inter_dc_tcp_nodelay=false;
 internode_compression=all; key_cache_save_period=14400;
 key_cache_size_in_mb=null; listen_address=null;
 max_hint_window_in_ms=1080; max_hints_delivery_threads=2;
 memtable_allocation_type=heap_buffers; native_transport_port=9042;
 num_tokens=256; partitioner=org.apache.cassandra.dht.Murmur3Partitioner;
 permissions_validity_in_ms=2000; range_request_timeout_in_ms=100;
 read_request_timeout_in_ms=9;
 

Re: error='Cannot allocate memory' (errno=12)

2015-05-12 Thread J. Ryan Earl
I see your ulimit -a above, missed that.  You should increase nofile
ulimit.  If you used JNA, you'd need to increase memlock too, but you
probably aren't using JNA.  1024 nofile is default and far too small, try
making that like 64K.  Thread handles can count against file descriptors
limits, similar to how pipes, sockets, etc can count against file
descriptor limits.

On Tue, May 12, 2015 at 7:55 PM, J. Ryan Earl o...@jryanearl.us wrote:

 What's your ulimit -a output?  Did you adjust nproc and nofile ulimits
 up?  Do you have JNA installed?  What about memlock ulimit and in
 sysctl.conf = kernel.shmmax?

 What's in cassandra.log?

 On Mon, May 11, 2015 at 7:24 AM, Rahul Bhardwaj 
 rahul.bhard...@indiamart.com wrote:

 Hi Robert,

 I saw somewhere you answering the same prob but no solution found. plz
 check again.


 regards:
 Rahul Bhardwaj




 On Mon, May 11, 2015 at 5:49 PM, Rahul Bhardwaj 
 rahul.bhard...@indiamart.com wrote:

 Bu it is giving same error with java 7 and open jdk

 On Mon, May 11, 2015 at 5:26 PM, Anishek Agarwal anis...@gmail.com
 wrote:

 Well i havent used 2.1.x cassandra neither java 8 but any reason for
 not using oracle JDK as i thought thats what is recommended. i saw a thread
 earlier stating java 8 with 2.0.14+ cassandra is tested but not sure about
 2.1.x versions.


 On Mon, May 11, 2015 at 4:04 PM, Rahul Bhardwaj 
 rahul.bhard...@indiamart.com wrote:

 PFA of error log​
  hs_err_pid9656.log
 https://docs.google.com/a/indiamart.com/file/d/0B0hlSlesIPVfaU9peGwxSXdsZGc/edit?usp=drive_web
 ​

 On Mon, May 11, 2015 at 3:58 PM, Rahul Bhardwaj 
 rahul.bhard...@indiamart.com wrote:

 free RAM:


 free -m
  total   used   free sharedbuffers
 cached
 Mem: 64398  23753  40644  0108
 8324
 -/+ buffers/cache:  15319  49078
 Swap: 2925 15   2909


  ulimit -a
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 515041
 max locked memory   (kbytes, -l) 64
 max memory size (kbytes, -m) unlimited
 open files  (-n) 1024
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515041
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited


 Also attaching complete error file


 On Mon, May 11, 2015 at 3:35 PM, Anishek Agarwal anis...@gmail.com
 wrote:

 the memory cassandra is trying to allocate is pretty small. you sure
 there is no hardware failure on the machine. what is the free ram on the
 box ?

 On Mon, May 11, 2015 at 3:28 PM, Rahul Bhardwaj 
 rahul.bhard...@indiamart.com wrote:

 Hi All,

 We have cluster of 3 nodes with  64GB RAM each. My cluster was
 running in healthy state. Suddenly one machine's cassandra daemon stops
 working and shut down.

 On restarting it after 2 minutes it again stops and is getting stop
 after returning below error in cassandra.log

 Java HotSpot(TM) 64-Bit Server VM warning: INFO:
 os::commit_memory(0x7fd064dc6000, 12288, 0) failed; error='Cannot
 allocate memory' (errno=12)
 #
 # There is insufficient memory for the Java Runtime Environment to
 continue.
 # Native memory allocation (malloc) failed to allocate 12288 bytes
 for committing reserved memory.
 # An error report file with more information is saved as:
 # /tmp/hs_err_pid23215.log
 INFO  09:50:41 Loading settings from
 file:/etc/cassandra/default.conf/cassandra.yaml
 INFO  09:50:41 Node
 configuration:[authenticator=AllowAllAuthenticator;
 authorizer=AllowAllAuthorizer; auto_snapshot=true;
 batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024;
 cas_contention_timeout_in_ms=1000; 
 client_encryption_options=REDACTED;
 cluster_name=Test Cluster; column_index_size_in_kb=64;
 commit_failure_policy=stop;
 commitlog_directory=/var/lib/cassandra/commitlog;
 commitlog_segment_size_in_mb=64; commitlog_sync=periodic;
 commitlog_sync_period_in_ms=1; compaction_throughput_mb_per_sec=16;
 concurrent_compactors=4; concurrent_counter_writes=32; 
 concurrent_reads=32;
 concurrent_writes=32; counter_cache_save_period=7200;
 counter_cache_size_in_mb=null; 
 counter_write_request_timeout_in_ms=5000;
 cross_node_timeout=false; 
 data_file_directories=[/var/lib/cassandra/data];
 disk_failure_policy=stop; dynamic_snitch_badness_threshold=0.1;
 dynamic_snitch_reset_interval_in_ms=60;
 dynamic_snitch_update_interval_in_ms=100;
 endpoint_snitch=GossipingPropertyFileSnitch; 
 hinted_handoff_enabled=true;
 hinted_handoff_throttle_in_kb=1024; incremental_backups=false;
 index_summary_capacity_in_mb=null;
 index_summary_resize_interval_in_minutes=60; 
 inter_dc_tcp_nodelay=false

Re: OOM and high SSTables count

2015-03-04 Thread J. Ryan Earl
We think it is this bug:
https://issues.apache.org/jira/browse/CASSANDRA-8860

We're rolling a patch to beta before rolling it into production.

On Wed, Mar 4, 2015 at 4:12 PM, graham sanderson gra...@vast.com wrote:

 We can confirm a problem on 2.1.3 (sadly our beta sstable state obviously
 did not match our production ones in some critical way)

 We have about 20k sstables on each of 6 nodes right now; actually a quick
 glance shows 15k of those are from OpsCenter, which may have something to
 do with beta/production mismatch

 I will look into the open OOM JIRA issue against 2.1.3 - we may being
 penalized for heavy use of JBOD (x7 per node)

 It also looks like 2.1.3 is leaking memory, though it eventually recovers
 via GCInspector causing a complete memtable flush.

 On Mar 4, 2015, at 12:31 PM, daemeon reiydelle daeme...@gmail.com wrote:

 Are you finding a correlation between the shards on the OOM DC1 nodes and
 the OOM DC2 nodes? Does your monitoring tool indicate that the DC1 nodes
 are using significantly more CPU (and memory) than the nodes that are NOT
 failing? I am leading you down the path to suspect that your sharding is
 giving you hot spots. Also are you using vnodes?

 Patrick


 On Wed, Mar 4, 2015 at 9:26 AM, Jan cne...@yahoo.com wrote:

 HI Roni;

 You mentioned:
 DC1 servers have 32GB of RAM and 10GB of HEAP. DC2 machines have 16GB of
 RAM and 5GB HEAP.

 Best practices would be be to:
 a)  have a consistent type of node across both DC's.  (CPUs, Memory,
 Heap  Disk)
 b)  increase heap on DC2 servers to be  8GB for C* Heap

 The leveled compaction issue is not addressed by this.
 hope this helps

 Jan/




   On Wednesday, March 4, 2015 8:41 AM, Roni Balthazar 
 ronibaltha...@gmail.com wrote:


 Hi there,

 We are running C* 2.1.3 cluster with 2 DataCenters: DC1: 30 Servers /
 DC2 - 10 Servers.
 DC1 servers have 32GB of RAM and 10GB of HEAP. DC2 machines have 16GB
 of RAM and 5GB HEAP.
 DC1 nodes have about 1.4TB of data and DC2 nodes 2.3TB.
 DC2 is used only for backup purposes. There are no reads on DC2.
 Every writes and reads are on DC1 using LOCAL_ONE and the RF DC1: 2 and
 DC2: 1.
 All keyspaces have STCS (Average 20~30 SSTables count each table on
 both DCs) except one that is using LCS (DC1: Avg 4K~7K SSTables / DC2:
 Avg 3K~14K SSTables).

 Basically we are running into 2 problems:

 1) High SSTables count on keyspace using LCS (This KS has 500GB~600GB
 of data on each DC1 node).
 2) There are 2 servers on DC1 and 4 servers in DC2 that went down with
 the OOM error message below:

 ERROR [SharedPool-Worker-111] 2015-03-04 05:03:26,394
 JVMStabilityInspector.java:94 - JVM state determined to be unstable.
 Exiting forcefully due to:
 java.lang.OutOfMemoryError: Java heap space
 at
 org.apache.cassandra.db.composites.CompoundSparseCellNameType.copyAndMakeWith(CompoundSparseCellNameType.java:186)
 ~[apache-cassandra-2.1.3.jar:2.1.3]
 at
 org.apache.cassandra.db.composites.AbstractCompoundCellNameType$CompositeDeserializer.readNext(AbstractCompoundCellNameType.java:286)
 ~[apache-cassandra-2.1.3.jar:2.1.3]
 at
 org.apache.cassandra.db.AtomDeserializer.readNext(AtomDeserializer.java:104)
 ~[apache-cassandra-2.1.3.jar:2.1.3]
 at
 org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:426)
 ~[apache-cassandra-2.1.3.jar:2.1.3]
 at
 org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.fetchMoreData(IndexedSliceReader.java:350)
 ~[apache-cassandra-2.1.3.jar:2.1.3]
 at
 org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:142)
 ~[apache-cassandra-2.1.3.jar:2.1.3]
 at
 org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:44)
 ~[apache-cassandra-2.1.3.jar:2.1.3]
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 ~[guava-16.0.jar:na]
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 ~[guava-16.0.jar:na]
 at
 org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82)
 ~[apache-cassandra-2.1.3.jar:2.1.3]
 at
 org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:172)
 ~[apache-cassandra-2.1.3.jar:2.1.3]
 at
 org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:155)
 ~[apache-cassandra-2.1.3.jar:2.1.3]
 at
 org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:146)
 ~[apache-cassandra-2.1.3.jar:2.1.3]
 at
 org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:125)
 ~[apache-cassandra-2.1.3.jar:2.1.3]
 at
 org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
 ~[apache-cassandra-2.1.3.jar:2.1.3]
 at
 

Re: Setting up JNA on CentOS 6.6. with cassandra20-2.0.12 and Oracle Java 1.7.0_75

2015-02-25 Thread J. Ryan Earl
We've been using jna-3.2.4-2.el6.x86_64 with the Sun/Oracle JDK for
probably 2-years now, and it works just fine.  Where are you seeing 3.2.7
required at?  I searched the pages you link and that string isn't even in
there.

Regardless, I assure you the newest jna that ships in the EL6 repo works
without issues.

On Wed, Feb 25, 2015 at 2:12 PM, Garret Pick pic...@whistle.com wrote:

 Hello,

 I'm having problems getting cassandra to start with the configuration
 listed above.

 Yum wants to install 3.2.4-2.el6 of the JNA along with several other
 packages including java-1.7.0-openjdk

 The documentation states that a JNA version earlier that 3.2.7 should not
 be used, so the jar file should be downloaded and installed directly into
 C*'s lib directory per


 http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installJnaTar.html

 From /var/log/cassandra/system.log

 all I see is

  INFO [main] 2015-02-25 20:06:10,202 CassandraDaemon.java (line 191)
 Classpath:
 /etc/cassandra/conf:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.12.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.12.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.12.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/super-csv-2.1.0.jar:/usr/share/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar

 and it never actually starts

 Note that JNA is in the classpath above and is when I remove it, cassandra
 starts successfully.

 I tried installing the DSE package and it looks like it wants to install
 the older 3.2.4 JNA as a dependency so there seems to be a discrepancy in
 documentation

 Per


 http://www.datastax.com/documentation/datastax_enterprise/4.6/datastax_enterprise/install/installRHELdse.html

 Note: JNA (Java Native Access) is automatically installed.

 thanks for any help,
 Garret



Re: hs_err_pid3013.log, out of memory?

2014-09-18 Thread J. Ryan Earl
On Wed, Sep 17, 2014 at 8:35 PM, Yatong Zhang bluefl...@gmail.com wrote:

 @Chris Lohfink I have 16G memory per node, all the other settings are
 default

 @J. Ryan Earl I am not sure. I am using the default settings.

 But I've found out it might be because some settings in
 '/etc/sysctl.conf'. I am still testing it



If JNA is installed, it will try to memlock all of the JVM process.  For
this to happen, you have to adjust the settings for the user you run
Cassandra as under /etc/security/limits.conf or limits.d/ and you have to
modifying kernel.shmmax in sysctl.conf accordingly.  If you do not, and JNA
is installed, the memlock will fail with the error you gave.


Re: ava.lang.OutOfMemoryError: unable to create new native thread

2014-09-18 Thread J. Ryan Earl
What's the 'ulimit -a' output of the user cassandra runs as?  From this and
your previous OOM thread, is sounds like you skipped the requisite OS
configuration.

On Wed, Sep 17, 2014 at 9:43 AM, Yatong Zhang bluefl...@gmail.com wrote:

 Hi there,

 I am using leveled compaction strategy and have many sstable files. The
 error was during the startup, so any idea about this?


 ERROR [FlushWriter:4] 2014-09-17 22:36:59,383 CassandraDaemon.java (line
 199) Exception in thread Thread[FlushWriter:4,5,main]
 java.lang.OutOfMemoryError: unable to create new native thread
 at java.lang.Thread.start0(Native Method)
 at java.lang.Thread.start(Thread.java:693)
 at
 java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
 at
 java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1017)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 ERROR [FlushWriter:2] 2014-09-17 22:36:59,472 CassandraDaemon.java (line
 199) Exception in thread Thread[FlushWriter:2,5,main]
 FSReadError in
 /data5/cass/system/compactions_in_progress/system-compactions_in_progress-jb-23-Index.db
 at
 org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:200)
 at
 org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.complete(MmappedSegmentedFile.java:168)
 at
 org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:334)
 at
 org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:324)
 at
 org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:394)
 at
 org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:342)
 at
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 Caused by: java.io.IOException: Map failed
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849)
 at
 org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:192)
 ... 10 more
 Caused by: java.lang.OutOfMemoryError: Map failed
 at sun.nio.ch.FileChannelImpl.map0(Native Method)
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:846)
 ... 11 more





Re: hs_err_pid3013.log, out of memory?

2014-09-16 Thread J. Ryan Earl
Are you using JNA?  Did you adjust your memlock limit?

On Tue, Sep 16, 2014 at 9:46 AM, Chris Lohfink clohf...@blackbirdit.com
wrote:

 How much memory does your system have? How much memory is system utilizing
 before starting Cassandra (use command free)? What are the heap setting it
 tries to use?

 Chris

 On Sep 15, 2014, at 8:16 PM, Yatong Zhang bluefl...@gmail.com wrote:

 It's during the startup. I tried to upgrade cassandra from 2.0.7 to
 2.0.10, but looks like cassandra could not start again. Also I found the
 following log at '/var/log/messages':

 Sep 16 09:06:59 storage6 kernel: INFO: task java:4971 blocked for more
 than 120 seconds.
 Sep 16 09:06:59 storage6 kernel:  Tainted: G
 --- H  2.6.32-431.el6.x86_64 #1
 Sep 16 09:06:59 storage6 kernel: echo 0 
 /proc/sys/kernel/hung_task_timeout_secs disables this message.
 Sep 16 09:06:59 storage6 kernel: java  D 0003 0
 4971  1 0x0080
 Sep 16 09:06:59 storage6 kernel: 88042b591c98 0082
 81ed4ff0 8803b4f01540
 Sep 16 09:06:59 storage6 kernel: 88042b591c68 810af370
 88042b591ca0 8803b4f01540
 Sep 16 09:06:59 storage6 kernel: 8803b4f01af8 88042b591fd8
 fbc8 8803b4f01af8
 Sep 16 09:06:59 storage6 kernel: Call Trace:
 Sep 16 09:06:59 storage6 kernel: [810af370] ?
 exit_robust_list+0x90/0x160
 Sep 16 09:06:59 storage6 kernel: [81076ad5] exit_mm+0x95/0x180
 Sep 16 09:06:59 storage6 kernel: [81076f1f] do_exit+0x15f/0x870
 Sep 16 09:06:59 storage6 kernel: [81077688]
 do_group_exit+0x58/0xd0
 Sep 16 09:06:59 storage6 kernel: [8108d046]
 get_signal_to_deliver+0x1f6/0x460
 Sep 16 09:06:59 storage6 kernel: [8100a265] do_signal+0x75/0x800
 Sep 16 09:06:59 storage6 kernel: [81066629] ?
 wake_up_new_task+0xd9/0x130
 Sep 16 09:06:59 storage6 kernel: [81070ead] ?
 do_fork+0x13d/0x480
 Sep 16 09:06:59 storage6 kernel: [810b1c0b] ?
 sys_futex+0x7b/0x170
 Sep 16 09:06:59 storage6 kernel: [8100aa80]
 do_notify_resume+0x90/0xc0
 Sep 16 09:06:59 storage6 kernel: [8100b341] int_signal+0x12/0x17
 Sep 16 09:06:59 storage6 kernel: INFO: task java:4972 blocked for more
 than 120 seconds.
 Sep 16 09:06:59 storage6 kernel:  Tainted: G
 --- H  2.6.32-431.el6.x86_64 #1
 Sep 16 09:06:59 storage6 kernel: echo 0 
 /proc/sys/kernel/hung_task_timeout_secs disables this message.
 Sep 16 09:06:59 storage6 kernel: java  D  0
 4972  1 0x0080
 Sep 16 09:06:59 storage6 kernel: 8803b4d7fc98 0082
 81ed6d78 8803b4cf1500
 Sep 16 09:06:59 storage6 kernel: 8803b4d7fc68 810af370
 8803b4d7fca0 8803b4cf1500
 Sep 16 09:06:59 storage6 kernel: 8803b4cf1ab8 8803b4d7ffd8
 fbc8 8803b4cf1ab8
 Sep 16 09:06:59 storage6 kernel: Call Trace:
 Sep 16 09:06:59 storage6 kernel: [810af370] ?
 exit_robust_list+0x90/0x160
 Sep 16 09:06:59 storage6 kernel: [81076ad5] exit_mm+0x95/0x180
 Sep 16 09:06:59 storage6 kernel: [81076f1f] do_exit+0x15f/0x870
 Sep 16 09:06:59 storage6 kernel: [81065e20] ?
 wake_up_state+0x10/0x20
 Sep 16 09:06:59 storage6 kernel: [81077688]
 do_group_exit+0x58/0xd0
 Sep 16 09:06:59 storage6 kernel: [8108d046]
 get_signal_to_deliver+0x1f6/0x460
 Sep 16 09:06:59 storage6 kernel: [8100a265] do_signal+0x75/0x800
 Sep 16 09:06:59 storage6 kernel: [810097cc] ?
 __switch_to+0x1ac/0x320
 Sep 16 09:06:59 storage6 kernel: [81527910] ?
 thread_return+0x4e/0x76e
 Sep 16 09:06:59 storage6 kernel: [810b1c0b] ?
 sys_futex+0x7b/0x170
 Sep 16 09:06:59 storage6 kernel: [8100aa80]
 do_notify_resume+0x90/0xc0
 Sep 16 09:06:59 storage6 kernel: [8100b341] int_signal+0x12/0x17
 Sep 16 09:06:59 storage6 kernel: INFO: task java:4973 blocked for more
 than 120 seconds.



 On Tue, Sep 16, 2014 at 9:00 AM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Sep 15, 2014 at 5:55 PM, Yatong Zhang bluefl...@gmail.com
 wrote:

 I just encountered an error which left a log '/hs_err_pid3013.log'. So
 is there a way to solve this?

 # There is insufficient memory for the Java Runtime Environment to
 continue.
 # Native memory allocation (malloc) failed to allocate 12288 bytes for
 committing reserved memory.


 Use less heap memory?

 You haven't specified under which circumstances this occurred, so I can
 only conjecture that it is likely being caused by writing too fast.

 Write more slowly.

 =Rob






Re: too many open files

2014-08-08 Thread J. Ryan Earl
Yes, definitely look how many open files are actual file handles versus
networks sockets.  We found a file handle leak in 2.0 but it was patched in
2.0.3 or .5 I think.  A million open files is way too high.


On Fri, Aug 8, 2014 at 5:19 PM, Andrey Ilinykh ailin...@gmail.com wrote:

 You may have this problem if your client doesn't reuse the connection but
 opens new every type. So, run netstat and check the number of established
 connections. This number should not be big.

 Thank you,
   Andrey


 On Fri, Aug 8, 2014 at 12:35 PM, Marcelo Elias Del Valle 
 marc...@s1mbi0se.com.br wrote:

 Hi,

 I am using Cassandra 2.0.9 running on Debian Wheezy, and I am having too
 many open files exceptions when I try to perform a large number of
 operations in my 10 node cluster.

 I saw the documentation
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/troubleshooting/trblshootTooManyFiles_r.html
 and I have set everything to the recommended settings, but I keep getting
 the errors.

 In the documentation it says: Another, much less likely possibility, is
 a file descriptor leak in Cassandra. Run lsof -n | grep java to check
 that the number of file descriptors opened by Java is reasonable and
 reports the error if the number is greater than a few thousand.

 I guess it's not the case, or else a lot of people would be complaining
 about it, but I am not sure what I could do to solve the problem.

 Any hint about how to solve it?

 My client is written in python and uses Cassandra Python Driver. Here are
 the exceptions I am having in the client:
 [s1log] 2014-08-08 12:16:09,631 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.151, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,632 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.142, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,633 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.143, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.142, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.145, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.144, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.148, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,732 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.146, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,733 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.77, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,734 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.76, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,734 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.75, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,735 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.142, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,736 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.185, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,942 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.144, scheduling retry in 512.0
 seconds: Timed out connecting to 200.200.200.144
 [s1log] 2014-08-08 12:16:09,998 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.77, scheduling retry in 512.0
 seconds: Timed out connecting to 200.200.200.77


 And here is the exception I am having in the server:

  WARN [Native-Transport-Requests:163] 2014-08-08 14:27:30,499
 BatchStatement.java (line 223) Batch of prepared statements for
 [identification.entity_lookup, identification.entity] is of size 25216,
 exceeding specified threshold of 5120 by 20096.
 ERROR [Native-Transport-Requests:150] 2014-08-08 14:27:31,611
 ErrorMessage.java (line 222) Unexpected exception during request
 java.io.IOException: Connection reset by peer
 at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
 at 

RE: Mixing CAS and TTL.

2014-02-25 Thread J Robert Ray
Thanks Daniel.

I am taking care to only expire the one column. There are other columns so
my row isn't completely deleted.
On Feb 24, 2014 11:37 PM, Daniel Shelepov dan...@timefork.com wrote:

 For the case where you don't get the update, is your whole row removed
 when TTL expires?  If so, you're essentially looking at a non-existing row,
 and I think it's not too surprising that a if col=null test will behave
 differently; I personally wouldn't call it a bug.  If you're dealing with
 disappearing rows, you should look into running INSERT IF NOT EXISTS
 queries instead of UPDATE IF col=null.



 If the row is not completely deleted when TTL expires, then the behavior
 is definitely fishy, and should probably be filed as a bug.



 To your other question, once a TTL update is expired, you can't infer its
 past existence through any queries.



 Daniel



 *From:* J Robert Ray [mailto:jrobert...@gmail.com]
 *Sent:* Monday, February 24, 2014 11:10 PM
 *To:* user@cassandra.apache.org
 *Subject:* Mixing CAS and TTL.



 Hi, I am trying to mix CAS and TTL and am wondering if this behavior that
 I am seeing is expected.



 I'm on 2.0.2 and using the java datastax 2.0.0-rc3 client.



 In my application, a server claims a row by assigning a value to a row
 using CAS, expecting the column to start out null. The column has a
 shortish TTL and while the application owns the row, it will periodically
 refresh the TTL on the column. If the application dies, the column expires
 and can be claimed by another server. My problem is that after the TTL
 expires, no server can successfully claim a row using a CAS update.



 If I set a TTL on a column with a null value (for demonstration purposes;
 the real code sets to a non-null value):



 UPDATE foo USING TTL 120 SET col = null WHERE ...;



 This CAS update will succeed:



 UPDATE foo USING TTL 120 SET col = 'some value' IF col = null; //
 [applied] = true

 or

 UPDATE foo USING TTL 120 SET col = 'some value' IF col = 'foo'; //
 [applied] = true, col = null



 However, if I allow the TTL to expire, then the same update now fails.



 UPDATE foo USING TTL 120 SET col = 'some value' IF col = null; //
 [applied] = false



 Note, after it fails, the ResultSet column definitions only contains
 [applied] and so does not provide the value of the 'col' column which
 failed the conditional update.



 It seems a null value is a different flavor of null than an expired
 column. Is it possible to make an update conditional on if a column is
 expired? Is this behavior expected or a bug?



 Thanks!



Mixing CAS and TTL.

2014-02-24 Thread J Robert Ray
Hi, I am trying to mix CAS and TTL and am wondering if this behavior that I
am seeing is expected.

I'm on 2.0.2 and using the java datastax 2.0.0-rc3 client.

In my application, a server claims a row by assigning a value to a row
using CAS, expecting the column to start out null. The column has a
shortish TTL and while the application owns the row, it will periodically
refresh the TTL on the column. If the application dies, the column expires
and can be claimed by another server. My problem is that after the TTL
expires, no server can successfully claim a row using a CAS update.

If I set a TTL on a column with a null value (for demonstration purposes;
the real code sets to a non-null value):

UPDATE foo USING TTL 120 SET col = null WHERE ...;

This CAS update will succeed:

UPDATE foo USING TTL 120 SET col = 'some value' IF col = null; // [applied]
= true
or
UPDATE foo USING TTL 120 SET col = 'some value' IF col = 'foo'; //
[applied] = true, col = null

However, if I allow the TTL to expire, then the same update now fails.

UPDATE foo USING TTL 120 SET col = 'some value' IF col = null; // [applied]
= false

Note, after it fails, the ResultSet column definitions only contains
[applied] and so does not provide the value of the 'col' column which
failed the conditional update.

It seems a null value is a different flavor of null than an expired column.
Is it possible to make an update conditional on if a column is expired? Is
this behavior expected or a bug?

Thanks!


nodetool flush usage

2014-01-09 Thread Christopher J. Bottaro
Am I correct in understanding that it needs to be run on each node in the
cluster?  For example, if I have a three node cluster, I'd have to run:

nodetool -h node-1 flush
nodetool -h node-2 flush
nodetool -h node-3 flush

?

Also, does block until it's done?

My use case is recreating a keyspace.  According to this:

https://issues.apache.org/jira/browse/CASSANDRA-4857

The only safe way to recreate a keyspace is to drop - flush - create.
 This will be done programmatically and I need to know when the flush
operation has completed.

Thanks!


Re: Write performance with 1.2.12

2013-12-12 Thread J. Ryan Earl
Why did you switch to RandomPartitioner away from Murmur3Partitioner?  Have
you tried with Murmur3?


   1. # partitioner: org.apache.cassandra.dht.Murmur3Partitioner
   2. partitioner: org.apache.cassandra.dht.RandomPartitioner



On Fri, Dec 6, 2013 at 10:36 AM, srmore comom...@gmail.com wrote:




 On Fri, Dec 6, 2013 at 9:59 AM, Vicky Kak vicky@gmail.com wrote:

 You have passed the JVM configurations and not the cassandra
 configurations which is in cassandra.yaml.


 Apologies, was tuning JVM and that's what was in my mind.
 Here are the cassandra settings http://pastebin.com/uN42GgYT



 The spikes are not that significant in our case and we are running the
 cluster with 1.7 gb heap.

 Are these spikes causing any issue at your end?


 There are no big spikes, the overall performance seems to be about 40% low.






 On Fri, Dec 6, 2013 at 9:10 PM, srmore comom...@gmail.com wrote:




 On Fri, Dec 6, 2013 at 9:32 AM, Vicky Kak vicky@gmail.com wrote:

 Hard to say much without knowing about the cassandra configurations.


 The cassandra configuration is
 -Xms8G
 -Xmx8G
 -Xmn800m
 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC
 -XX:+CMSParallelRemarkEnabled
 -XX:SurvivorRatio=4
 -XX:MaxTenuringThreshold=2
 -XX:CMSInitiatingOccupancyFraction=75
 -XX:+UseCMSInitiatingOccupancyOnly



 Yes compactions/GC's could skipe the CPU, I had similar behavior with
 my setup.


 Were you able to get around it ?



 -VK


 On Fri, Dec 6, 2013 at 7:40 PM, srmore comom...@gmail.com wrote:

 We have a 3 node cluster running cassandra 1.2.12, they are pretty big
 machines 64G ram with 16 cores, cassandra heap is 8G.

 The interesting observation is that, when I send traffic to one node
 its performance is 2x more than when I send traffic to all the nodes. We
 ran 1.0.11 on the same box and we observed a slight dip but not half as
 seen with 1.2.12. In both the cases we were writing with LOCAL_QUORUM.
 Changing CL to ONE make a slight improvement but not much.

 The read_Repair_chance is 0.1. We see some compactions running.

 following is my iostat -x output, sda is the ssd (for commit log) and
 sdb is the spinner.

 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   66.460.008.950.010.00   24.58

 Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
 avgqu-sz   await  svctm  %util
 sda   0.0027.60  0.00  4.40 0.00   256.00
 58.18 0.012.55   1.32   0.58
 sda1  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 sda2  0.0027.60  0.00  4.40 0.00   256.00
 58.18 0.012.55   1.32   0.58
 sdb   0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 sdb1  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-0  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-1  0.00 0.00  0.00  0.60 0.00 4.80
 8.00 0.005.33   2.67   0.16
 dm-2  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-3  0.00 0.00  0.00 24.80 0.00   198.40
 8.00 0.249.80   0.13   0.32
 dm-4  0.00 0.00  0.00  6.60 0.0052.80
 8.00 0.011.36   0.55   0.36
 dm-5  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-6  0.00 0.00  0.00 24.80 0.00   198.40
 8.00 0.29   11.60   0.13   0.32



 I can see I am cpu bound here but couldn't figure out exactly what is
 causing it, is this caused by GC or Compaction ? I am thinking it is
 compaction, I see a lot of context switches and interrupts in my vmstat
 output.

 I don't see GC activity in the logs but see some compaction activity.
 Has anyone seen this ? or know what can be done to free up the CPU.

 Thanks,
 Sandeep









Re: cassandra performance problems

2013-12-06 Thread J. Ryan Earl
On Thu, Dec 5, 2013 at 6:33 AM, Alexander Shutyaev shuty...@gmail.comwrote:

 We've plugged it into our production environment as a cache in front of
 postgres. Everything worked fine, we even stressed it by explicitly
 propagating about 30G (10G/node) data from postgres to cassandra.


If you just want a caching layer, why wouldn't you use Memcached or Redis
instead?  Cassandra is designed to be a persist store and not so much
designed as caching layer.  If you were replacing your use of Postgres
completely, that would be appropriate.


Re: Sample Trigger Code to get inserted value

2013-12-02 Thread J Ramesh Kumar
Finally I got it working...

Below are the code snippet which will be useful for trigger users,

public CollectionRowMutation augment(ByteBuffer key, ColumnFamily cf) {
try {
ByteBuffer id_bb = CompositeType.extractComponent(key, 0);
UUID id=TimeUUIDType.instance.compose(id_bb);

ByteBuffer data_key_bb = CompositeType.extractComponent(key, 1);
String data_key=UTF8Type.instance.compose(data_key_bb);


Iterator col_itr=cf.iterator();

Column ts_col=(Column)col_itr.next();
ByteBuffer time_bb=CompositeType.extractComponent(ts_col.name(),0);
long time=(TimestampType.instance.compose(time_bb)).getTime();


Column data_bb=(Column)col_itr.next();
String data=UTF8Type.instance.compose(data_bb.value());

log( id -- +id.toString());
log( data_key--+data_key);
log( time == +time);
log( data == +data);
} catch (Exception e) {
logger.warn(Exception , e);
}
return null;
}

PS: Since I know my table format, I hardcoded the column comparator type.
If we want to write generic trigger code, we can use
cf.getComponentComparator()/getKeyValidator().

- Ramesh


On Wed, Nov 27, 2013 at 5:10 PM, J Ramesh Kumar rameshj1...@gmail.comwrote:

 Hi,

 I need your help on extract column names and values in the trigger augment
 method.

 *Table Def :*

 create table dy_data (
 id timeuuid,
 data_key text,
 time timestamp,
 data text,
 primary key((id,data_key),time)) with clustering order by (time desc);

 public class ArchiveTrigger implements ITrigger {
   public CollectionRowMutation augment(ByteBuffer key, ColumnFamily cf) {
 try {
 *// Below loop only has 2 columns ( one is data and another
 one may be time but i am not sure, because i cannot get value.   *

 for (Column cell : cf) {
* //Got Exception if I try to get column name*
 String name = ByteBufferUtil.string(cell.name());
* //Got only data column value and empty value for
 another column may be time*. *If I try
 ByteBufferUtil.toLong(cell.value()) it throws exception*
 String value = ByteBufferUtil.string(cell.value());
 log( name =  + name);
 log( value =  + value);
 }
 } catch (Exception e) {
 logger.warn(Exception , e);
 }
 return null;
 }
 }


 I tried my best to search sample code in google. But failed. Please help
 me with sample code.

 Thanks in advance.

 Regards,
 Ramesh



Sample Trigger Code to get inserted value

2013-11-27 Thread J Ramesh Kumar
Hi,

I need your help on extract column names and values in the trigger augment
method.

*Table Def :*

create table dy_data (
id timeuuid,
data_key text,
time timestamp,
data text,
primary key((id,data_key),time)) with clustering order by (time desc);

public class ArchiveTrigger implements ITrigger {
  public CollectionRowMutation augment(ByteBuffer key, ColumnFamily cf) {
try {
*// Below loop only has 2 columns ( one is data and another
one may be time but i am not sure, because i cannot get value.   *

for (Column cell : cf) {
   * //Got Exception if I try to get column name*
String name = ByteBufferUtil.string(cell.name());
   * //Got only data column value and empty value for another
column may be time*. *If I try ByteBufferUtil.toLong(cell.value()) it
throws exception*
String value = ByteBufferUtil.string(cell.value());
log( name =  + name);
log( value =  + value);
}
} catch (Exception e) {
logger.warn(Exception , e);
}
return null;
}
}


I tried my best to search sample code in google. But failed. Please help me
with sample code.

Thanks in advance.

Regards,
Ramesh


Unable to load dependent classes of a trigger

2013-11-26 Thread J Ramesh Kumar
Hi,

I wrote a trigger and it will call internally some other classes. I added
all the dependent classes into a jar and put it into the *conf/tiggers*.
But cassandra does not load the dependent classes which available in the
jar. How can Solve this issue ?

Thanks,
Ramesh

Traces which found in the cassandra log

DEBUG [Thrift:1] 2013-11-26 16:44:05,509 CustomClassLoader.java (line 115)
Class not found using parent class loader,
java.lang.ClassNotFoundException:
com.zoho.predict.trigger.AATriggersThreadLocal
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at
org.apache.cassandra.triggers.CustomClassLoader.loadClassInternal(CustomClassLoader.java:111)
at
org.apache.cassandra.triggers.CustomClassLoader.loadClass(CustomClassLoader.java:103)
at
com.zoho.predict.trigger.ArchiveTrigger.getString(ArchiveTrigger.java:68)
at
com.zoho.predict.trigger.ArchiveTrigger.augment(ArchiveTrigger.java:49)
at
org.apache.cassandra.triggers.TriggerExecutor.execute(TriggerExecutor.java:123)
at
org.apache.cassandra.triggers.TriggerExecutor.execute(TriggerExecutor.java:73)
at
org.apache.cassandra.service.StorageProxy.mutateWithTriggers(StorageProxy.java:547)
at
org.apache.cassandra.cql3.statements.ModificationStatement.executeWithoutCondition(ModificationStatement.java:379)
at
org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:363)
at
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:101)
at
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:117)
at
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:108)
at
org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1933)
at
org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4394)
at
org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4378)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:194)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)


Re: nodetool repair seems to increase linearly with number of keyspaces

2013-11-26 Thread Christopher J. Bottaro
We only have a single CF per keyspace.  Actually we have 2, but one is tiny
(only has 2 rows in it and is queried once a month or less).

Yup, using vnodes with 256 tokens.

Cassandra 1.2.10.

-- C


On Mon, Nov 25, 2013 at 2:28 PM, John Pyeatt john.pye...@singlewire.comwrote:

 Mr. Bottaro,

 About how many column families are in your keyspaces? We have 28 per
 keyspace.

 Are you using Vnodes? We are and they are set to 256

 What version of cassandra are you running. We are running 1.2.9


 On Mon, Nov 25, 2013 at 11:36 AM, Christopher J. Bottaro 
 cjbott...@academicworks.com wrote:

 We have the same setup:  one keyspace per client, and currently about 300
 keyspaces.  nodetool repair takes a long time, 4 hours with -pr on a single
 node.  We have a 4 node cluster with about 10 gb per node.  Unfortunately,
 we haven't been keeping track of the running time as keyspaces, or load,
 increases.

 -- C


 On Wed, Nov 20, 2013 at 6:53 AM, John Pyeatt 
 john.pye...@singlewire.comwrote:

 We have an application that has been designed to use potentially 100s of
 keyspaces (one for each company).

 One thing we are noticing is that nodetool repair across all of the
 keyspaces seems to increase linearly based on the number of keyspaces. For
 example, if we have a 6 node ec2 (m1.large) cluster across 3 Availability
 Zones and create 20 keyspaces a nodetool repair -pr on one node takes 3
 hours even with no data in any of the keyspaces. If I bump that up to 40
 keyspaces it takes 6 hours.

 Is this the behaviour you would expect?

 Is there anything you can think of (short of redesigning the cluster to
 limit keyspaces) to increase the performance of the nodetool repairs?

 My obvious concern is that as this application grows and we get more
 companies using our it we will eventually have too many keyspaces to
 perform repairs on the cluster.

 --
 John Pyeatt
 Singlewire Software, LLC
 www.singlewire.com
 --
 608.661.1184
 john.pye...@singlewire.com





 --
 John Pyeatt
 Singlewire Software, LLC
 www.singlewire.com
 --
 608.661.1184
 john.pye...@singlewire.com



Data loss when swapping out cluster

2013-11-25 Thread Christopher J. Bottaro
Hello,

We recently experienced (pretty severe) data loss after moving our 4 node
Cassandra cluster from one EC2 availability zone to another.  Our strategy
for doing so was as follows:

   - One at a time, bring up new nodes in the new availability zone and
   have them join the cluster.
   - One at a time, decommission the old nodes in the old availability zone
   and turn them off (stop the Cassandra process).

Everything seemed to work as expected.  As we decommissioned each node, we
checked the logs for messages indicating yes, this node is done
decommissioning before turning the node off.

Pretty quickly after the old nodes left the cluster, we started getting
client calls about data missing.

We immediately turned the old nodes back on and when they rejoined the
cluster *most* of the reported missing data returned.  For the rest of the
missing data, we had to spin up a new cluster from EBS snapshots and copy
it over.

What did we do wrong?

In hindsight, we noticed a few things which may be clues...

   - The new nodes had much lower load after joining the cluster than the
   old ones (3-4 gb as opposed to 10 gb).
   - We have EC2Snitch turned on, although we're using SimpleStrategy for
   replication.
   - The new nodes showed even ownership (via nodetool status) after
   joining the cluster.

Here's more info about our cluster...

   - Cassandra 1.2.10
   - Replication factor of 3
   - Vnodes with 256 tokens
   - All tables made via CQL
   - Data dirs on EBS (yes, we are aware of the performance implications)


Thanks for the help.


Re: Cassandra high heap utilization under heavy reads and writes.

2013-11-25 Thread Christopher J. Bottaro
Yes, we saw this same behavior.

A couple of months ago, we moved a large portion of our data out of
Postgres and into Cassandra.  The initial migration was done in a
distributed manner:  we had 600 (or 800, can't remember) processes
reading from Postgres and writing to Cassandra in tight loops.  This caused
the exact behavior you described.  We also did a read before a write.

After we got through the initial data migration, our normal workload is
*much* less writes (and reads for that matter) such that our cluster can
easily handle it, so we didn't investigate further.

-- C


On Sat, Nov 23, 2013 at 10:55 PM, srmore comom...@gmail.com wrote:

 Hello,
 We moved to cassandra 1.2.9 from 1.0.11 to take advantage of the off-heap
 bloom filters and other improvements.

 We see a lot of messages dropped under high load conditions. We noticed
 that when we do heavy read AND write simultaneously (we read first and
 check whether the key exists if not we write it) Cassandra heap increases
 dramatically and then gossip marks the node down (as a result of high load
 on the node).


 Under heavy 'reads only' we don't see this behavior.  Has anyone seen this
 behavior ? any suggestions.

 Thanks !





Re: nodetool repair seems to increase linearly with number of keyspaces

2013-11-25 Thread Christopher J. Bottaro
We have the same setup:  one keyspace per client, and currently about 300
keyspaces.  nodetool repair takes a long time, 4 hours with -pr on a single
node.  We have a 4 node cluster with about 10 gb per node.  Unfortunately,
we haven't been keeping track of the running time as keyspaces, or load,
increases.

-- C


On Wed, Nov 20, 2013 at 6:53 AM, John Pyeatt john.pye...@singlewire.comwrote:

 We have an application that has been designed to use potentially 100s of
 keyspaces (one for each company).

 One thing we are noticing is that nodetool repair across all of the
 keyspaces seems to increase linearly based on the number of keyspaces. For
 example, if we have a 6 node ec2 (m1.large) cluster across 3 Availability
 Zones and create 20 keyspaces a nodetool repair -pr on one node takes 3
 hours even with no data in any of the keyspaces. If I bump that up to 40
 keyspaces it takes 6 hours.

 Is this the behaviour you would expect?

 Is there anything you can think of (short of redesigning the cluster to
 limit keyspaces) to increase the performance of the nodetool repairs?

 My obvious concern is that as this application grows and we get more
 companies using our it we will eventually have too many keyspaces to
 perform repairs on the cluster.

 --
 John Pyeatt
 Singlewire Software, LLC
 www.singlewire.com
 --
 608.661.1184
 john.pye...@singlewire.com



Re: Getting into Too many open files issues

2013-11-20 Thread J. Ryan Earl
There was a bug introduced in 2.0.0-beta1 related to TTL, a patch just came
available in: https://issues.apache.org/jira/browse/CASSANDRA-6275


On Thu, Nov 7, 2013 at 5:15 AM, Murthy Chelankuri kmurt...@gmail.comwrote:

 I have experimenting cassandra latest version for storing the huge the in
 our application.

 Write are doing good. but when comes to reads i have obsereved that
 cassandra is getting into too many open files issues. When i check the logs
 its not able to open the cassandra data files any more before of the file
 descriptors limits.


 Can some one suggest me what i am going wrong what could be issues which
 causing the read operating leads to Too many open files issue.



Re: Cass 2.0.0: Extensive memory allocation when row_cache enabled

2013-11-14 Thread J. Ryan Earl
First off, I'm curious what hardware (system specs) you're running this on?

Secondly, here are some observations:
* You're not running the newest JDK7, I can tell by your stack-size.
 Consider getting the newest.

* Cassandra 2.0.2 has a lot of improvements, consider upgrading.  We
noticed improved heap usage compared to 2.0.2

* Have you simply tried decreasing the size of your row cache?  Tried 256MB?

* Do you have JNA installed?  Otherwise, you're not getting off-heap usage
for these caches which seems likely.  Check your cassandra.log to verify
JNA operation.

* Your NewGen is too small.  See your heap peaks?  This is because
short-lived memory is being put into OldGen, which only gets cleaned up
during fullGC.  You should set your NewGen to about 25-30% of your total
heapsize.  Many objects are short-lived, and CMS GC is significantly more
efficient if the shorter-lived objects never get promoted to OldGen; you'll
get more concurrent, non-blocking GC.  If you're not using JNA (per above)
row-cache and key-cache is still on-heap, so you want your NewGen to be =
twice as large as the size of these combined caches.  You should never so
those crazy heap spikes, your caches are essentially overflowing into
OldGen (with JNA).



On Tue, Nov 5, 2013 at 3:04 AM, Jiri Horky ho...@avast.com wrote:

 Hi there,

 we are seeing extensive memory allocation leading to quite long and
 frequent GC pauses when using row cache. This is on cassandra 2.0.0
 cluster with JNA 4.0 library with following settings:

 key_cache_size_in_mb: 300
 key_cache_save_period: 14400
 row_cache_size_in_mb: 1024
 row_cache_save_period: 14400
 commitlog_sync: periodic
 commitlog_sync_period_in_ms: 1
 commitlog_segment_size_in_mb: 32

 -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms10G -Xmx10G
 -Xmn1024M -XX:+HeapDumpOnOutOfMemoryError

 -XX:HeapDumpPath=/data2/cassandra-work/instance-1/cassandra-1383566283-pid1893.hprof
 -Xss180k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
 -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:+UseCondCardMark

 We have disabled row cache on one node to see  the  difference. Please
 see attached plots from visual VM, I think that the effect is quite
 visible. I have also taken 10x jmap -histo after 5s on a affected
 server and plotted the result, attached as well.

 I have taken a dump of the application when the heap size was 10GB, most
 of the memory was unreachable, which was expected. The majority was used
 by 55-59M objects of HeapByteBuffer, byte[] and
 org.apache.cassandra.db.Column classes. I also include a list of inbound
 references to the HeapByteBuffer objects from which it should be visible
 where they are being allocated. This was acquired using Eclipse MAT.

 Here is the comparison of GC times when row cache enabled and disabled:

 prg01 - row cache enabled
   - uptime 20h45m
   - ConcurrentMarkSweep - 11494686ms
   - ParNew - 14690885 ms
   - time spent in GC: 35%
 prg02 - row cache disabled
   - uptime 23h45m
   - ConcurrentMarkSweep - 251ms
   - ParNew - 230791 ms
   - time spent in GC: 0.27%

 I would be grateful for any hints. Please let me know if you need any
 further information. For now, we are going to disable the row cache.

 Regards
 Jiri Horky



Re: Multi-dc restart impact

2013-10-10 Thread J. Ryan Earl
Are you doing QUORUM reads instead of LOCAL_QUORUM reads?


On Wed, Oct 9, 2013 at 7:41 PM, Chris Burroughs
chris.burrou...@gmail.comwrote:

 I have not been able to do the test with the 2nd cluster, but have been
 given a disturbing data point.  We had a disk slowly fail causing a
 significant performance degradation that was only resolved when the sick
 node was killed.
  * Perf in DC w/ sick disk: 
 http://i.imgur.com/W1I5ymL.**png?1http://i.imgur.com/W1I5ymL.png?1
  * perf in other DC: 
 http://i.imgur.com/gEMrLyF.**png?1http://i.imgur.com/gEMrLyF.png?1

 Not only was a single slow node able to cause an order of magnitude
 performance hit in a dc, but the other dc faired *worse*.


 On 09/18/2013 08:50 AM, Chris Burroughs wrote:

 On 09/17/2013 04:44 PM, Robert Coli wrote:

 On Thu, Sep 5, 2013 at 6:14 AM, Chris Burroughs
 chris.burrou...@gmail.com**wrote:

  We have a 2 DC cluster running cassandra 1.2.9.  They are in actual
 physically separate DCs on opposite coasts of the US, not just logical
 ones.  The primary use of this cluster is CL.ONE reads out of a single
 column family.  My expectation was that in such a scenario restarts
 would
 have minimal impact in the DC where the restart occurred, and no
 impact in
 the remote DC.

 We are seeing instead that restarts in one DC have a dramatic impact on
 performance in the other (let's call them DCs A and B).


 Did you end up filing a JIRA on this, or some other outcome?

 =Rob



 No.  I am currently in the process of taking a 2nd cluster from being
 single to dual DC.  Once that is done I was going to repeat the test
 with each cluster and gather as much information as reasonable.





Help on Cassandra Limitaions

2013-09-05 Thread J Ramesh Kumar
Hi,

http://wiki.apache.org/cassandra/CassandraLimitations

In the above link, I found the below limitation,

The maximum number of cells (rows x columns) in a single partition is 2
billion..

Here what does partition mean ? Is it node (or) column family (or)
anything else ?

Thanks,
Ramesh


Re: Date range queries

2013-06-24 Thread Christopher J. Bottaro
Yes, that makes sense and that article helped a lot, but I still have a few
questions...

The created_at in our answers table is basically used as a version id.
 When a user updates his answer, we don't overwrite the old answer, but
rather insert a new answer with a more recent timestamp (the version).

answers
---
user_id | created_at | question_id | result
---
  1 | 2013-01-01 | 1   | yes
  1 | 2013-01-01 | 2   | blah
  1 | 2013-01-02 | 1   | no

So the queries we really want to run are find me all the answers for a
given user at a given time.  So given the date of 2013-01-02 and user_id
1, we would want rows 2 and 3 returned (since rows 3 obsoletes row 1).  Is
it possible to do this with CQL given the current schema?

As an aside, we can do this in Postgresql using window functions, not
standard SQL, but pretty neat.

We can alter our schema like so...

answers
---
user_id | start_at | end_at | question_id | result

Where the start_at and end_at denote when an answer is active.  So the
example above would become:

answers
---
user_id | start_at   | end_at | question_id | result

  1 | 2013-01-01 | 2013-01-02 | 1   | yes
  1 | 2013-01-01 | null   | 2   | blah
  1 | 2013-01-02 | null   | 1   | no

Now we can query SELECT * FROM answers WHERE user_id = 1 AND start_at =
'2013-01-02' AND (end_at  '2013-01-02' OR end_at IS NULL).

How would one define the partitioning key and cluster columns in CQL to
accomplish this?  Is it as simple as PRIMARY KEY (user_id, start_at,
end_at, question_id) (remembering that we sometimes want to limit by
question_id)?

Also, we are a bit worried about race conditions.  Consider two separate
processes updating an answer for a given user_id / question_id.  There will
be a race condition between the two to update the correct row's end_at
field.  Does that make sense?  I can draw it out with ASCII tables, but I
feel like this email is already too long... :P

Thanks for the help.



On Wed, Jun 19, 2013 at 2:28 PM, David McNelis dmcne...@gmail.com wrote:

 So, if you want to grab by the created_at and occasionally limit by
 question id, that is why you'd use created_at.

 The way the primary keys work is the first part of the primary key is the
 Partioner key, that field is what essentially is the single cassandra row.
  The second key is the order preserving key, so you can sort by that key.
  If you have a third piece, then that is the secondary order preserving key.

 The reason you'd want to do (user_id, created_at, question_id) is because
 when you do a query on the keys, if you MUST use the preceding pieces of
 the primary key.  So in your case, you could not do a query with just
 user_id and question_id with the user-created-question key.  Alternatively
 if you went with (user_id, question_id, created_at), you would not be able
 to include a range of created_at unless you were also filtering on the
 question_id.

 Does that make sense?

 As for the large rows, 10k is unlikely to cause you too many issues
 (unless the answer is potentially a big blob of text).  Newer versions of
 cassandra deal with a lot of things in far, far, superior ways to  1.0.

 For a really good primary on keys in cql and how to potentially avoid hot
 rows, a really good article to read is this one:
 http://thelastpickle.com/2013/01/11/primary-keys-in-cql/  Aaron did a
 great job of laying out the subtleties of primary keys in CQL.


 On Wed, Jun 19, 2013 at 2:21 PM, Christopher J. Bottaro 
 cjbott...@academicworks.com wrote:

 Interesting, thank you for the reply.

 Two questions though...

 Why should created_at come before question_id in the primary key?  In
 other words, why (user_id, created_at, question_id) instead of (user_id,
 question_id, created_at)?

 Given this setup, all a user's answers (all 10k) will be stored in a
 single C* (internal, not cql) row?  I thought having fat or big rows
 was bad.  I worked with Cassandra 0.6 at my previous job and given the
 nature of our work, we would sometimes generate these fat rows... at
 which point Cassandra would basically shit the bed.

 Thanks for the help.


 On Wed, Jun 19, 2013 at 12:26 PM, David McNelis dmcne...@gmail.comwrote:

 I think you'd just be better served with just a little different primary
 key.

 If your primary key was (user_id, created_at)  or (user_id, created_at,
 question_id), then you'd be able to run the above query without a problem.

 This will mean that the entire pantheon of a specific user_id will be
 stored as a 'row' (in the old style C* vernacular), and then the
 information would be ordered by the 2nd piece of the primary key (or 2nd,
 then 3rd if you included question_id).

 You would certainly want to include any field that makes a record unique
 in the primary key.  Another thing to note is that if a field is part

Date range queries

2013-06-19 Thread Christopher J. Bottaro
Hello,

We are considering using Cassandra and I want to make sure our use case
fits Cassandra's strengths.  We have the table like:

answers
---
user_id | question_id | result | created_at

Where our most common query will be something like:

SELECT * FROM answers WHERE user_id = 123 AND created_at  '01/01/2012' AND
created_at  '01/01/2013'

Sometimes we will also limit by a question_id or a list of question_ids.

Secondary indexes will be created on user_id and question_id.  We expect
the upper bound of number of answers for a given user to be around 10,000.

Now my understanding of how Cassandra will run the aforementioned query is
that it will load all the answers for a given user into memory using the
secondary index, then scan over that set filtering based on the dates.

Considering that that will be our most used query and it will happen very
often, is this a bad use case for Cassandra?

Thanks for the help.


Re: Date range queries

2013-06-19 Thread Christopher J. Bottaro
Interesting, thank you for the reply.

Two questions though...

Why should created_at come before question_id in the primary key?  In other
words, why (user_id, created_at, question_id) instead of (user_id,
question_id, created_at)?

Given this setup, all a user's answers (all 10k) will be stored in a single
C* (internal, not cql) row?  I thought having fat or big rows was bad.
 I worked with Cassandra 0.6 at my previous job and given the nature of our
work, we would sometimes generate these fat rows... at which point
Cassandra would basically shit the bed.

Thanks for the help.


On Wed, Jun 19, 2013 at 12:26 PM, David McNelis dmcne...@gmail.com wrote:

 I think you'd just be better served with just a little different primary
 key.

 If your primary key was (user_id, created_at)  or (user_id, created_at,
 question_id), then you'd be able to run the above query without a problem.

 This will mean that the entire pantheon of a specific user_id will be
 stored as a 'row' (in the old style C* vernacular), and then the
 information would be ordered by the 2nd piece of the primary key (or 2nd,
 then 3rd if you included question_id).

 You would certainly want to include any field that makes a record unique
 in the primary key.  Another thing to note is that if a field is part of
 the primary key you can not create a secondary index on that field.  You
 can work around that by storing the field twice, but you might want to
 rethink your structure if you find yourself doing that often.


 On Wed, Jun 19, 2013 at 12:05 PM, Christopher J. Bottaro 
 cjbott...@academicworks.com wrote:

 Hello,

 We are considering using Cassandra and I want to make sure our use case
 fits Cassandra's strengths.  We have the table like:

 answers
 ---
 user_id | question_id | result | created_at

 Where our most common query will be something like:

 SELECT * FROM answers WHERE user_id = 123 AND created_at  '01/01/2012'
 AND created_at  '01/01/2013'

 Sometimes we will also limit by a question_id or a list of question_ids.

 Secondary indexes will be created on user_id and question_id.  We expect
 the upper bound of number of answers for a given user to be around 10,000.

 Now my understanding of how Cassandra will run the aforementioned query
 is that it will load all the answers for a given user into memory using the
 secondary index, then scan over that set filtering based on the dates.

 Considering that that will be our most used query and it will happen very
 often, is this a bad use case for Cassandra?

 Thanks for the help.





Issues with describe_splits_ex

2013-02-25 Thread Hermán J. Camarena
Hi,
I'm trying to use describe_splits_ex to get splits for local records only.  
When I call it, I always get a list with only one CfSplit.  The start_token and 
end_token are always the same I passed as input and row_count is always 128.  
I'm using 1.1.9.  What am I doing wrong?
Thanks,
Hermán




Re: Strange delay in query

2012-11-13 Thread J. D. Jordan
Correct

On Nov 13, 2012, at 5:21 AM, André Cruz andre.c...@co.sapo.pt wrote:

 On Nov 13, 2012, at 8:54 AM, aaron morton aa...@thelastpickle.com wrote:
 
 I don't think that statement is accurate.
 Which part ?
 
 Probably this part:
 After running a major compaction, automatic minor compactions are no longer 
 triggered, frequently requiring you to manually run major compactions on a 
 routine basis.
 
 From what I read what happens is that it takes a lot longer for minor 
 compactions to be triggered because 3 more files with the size equal to the 
 compacted one have to be created?
 
 André


Astyanax error

2012-09-17 Thread A J
Hello,

I am tyring to retrieve a list of Column Names (that are defined as
Integer) from a CF with RowKey as Integer as well. (I don't care for
the column values that are just nulls)

Following is snippet of my Astyanax code. I am getting 0 columns but I
know the key that I am querying contains a few hundred columns. Any
idea what part of the code below is incorrect ?

Thanks.

Astyanax code:

ColumnFamilyInteger, Integer CF1 =
new ColumnFamilyInteger, Integer(
CF1, // Column Family Name
IntegerSerializer.get(),   // Key Serializer
IntegerSerializer.get());  // Column Serializer

//Reading data
int NUM_EVENTS = 9;

StopWatch clock = new StopWatch();
clock.start();
for (int i = 0; i  NUM_EVENTS; ++i) {
ColumnListInteger result = keyspace.prepareQuery(CF1)
.getKey(1919)
.execute().getResult();
System.out.println( results are:  + result.size() );
}
clock.stop();



CF definition:
===
[default@ks1] describe CF1;
ColumnFamily: CF1
  Key Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Default column value validator: org.apache.cassandra.db.marshal.BytesType
  Columns sorted by: org.apache.cassandra.db.marshal.IntegerType


Astyanax - build

2012-09-14 Thread A J
Hi,
I am new to java and trying to get the Astyanax client running for Cassandra.

Downloaded astyanax from https://github.com/Netflix/astyanax. How do I
compile the source code from here it in a very simple fashion from
linux command line ?

Thanks.


Advantage of pre-defining column metadata

2012-08-28 Thread A J
For static column family what is the advantage in pre-defining column metadata ?

I can see ease of understanding type of values that the CF contains
and that clients will reject incompatible insertion.

But are there any major advantages in terms of performance or
something else that makes it beneficial to define the metadata upfront
?

Thanks.


nodetool , localhost connection refused

2012-08-20 Thread A J
I am running 1.1.3
Nodetool on the database node (just a single node db) is giving the error:
Failed to connect to 'localhost:7199': Connection refused

Any idea what could be causing this ?

Thanks.


Re: nodetool , localhost connection refused

2012-08-20 Thread A J
Yes, the telnet does not work.
Don't know what it was but switching to 1.1.4 solved the issue.

On Mon, Aug 20, 2012 at 6:17 PM, Hiller, Dean dean.hil...@nrel.gov wrote:
 My guess is telnet localhost 7199 also fails?  And if you are on linux
 and run netstat -anp, you will see no one is listening on that port?

 So database node did not start and bind to that port and you would see
 exception in the logs of that database nodeŠ.just a guess.

 Dean

 On 8/20/12 4:10 PM, A J s5a...@gmail.com wrote:

I am running 1.1.3
Nodetool on the database node (just a single node db) is giving the error:
Failed to connect to 'localhost:7199': Connection refused

Any idea what could be causing this ?

Thanks.



'WHERE' with several indexed columns

2012-08-16 Thread A J
Hi
If I have a WHERE clause in CQL with several 'AND' and each column is
indexed, which index(es) is(are) used ?
Just the first field in the where clause or all the indexes involved
in the clause ?

Also is index used only with an equality operator or also with greater
than /less than comparator as well ?

Thanks.


Custom Partitioner Type

2012-08-13 Thread A J
Is it possible to use a custom Partitioner type (other than RP or BOP) ?
Say if my rowkeys are all Integers and I want all even keys to go to
node1 and odd keys to node2, is it feasible ? How would I go about ?

Thanks.


Physical storage of rowkey

2012-08-09 Thread A J
Are row key hashed before being physically stored in Cassandra ? If
so, what hash function is used to ensure collision is minimal.

Thanks.


Re: TimedOutException caused by

2012-07-24 Thread J . Amding


aaron morton aaron at thelastpickle.com writes:

 
 The cluster is running into GC problems and this is slowing it down under the 
stress test. When it slows down one or more of the nodes is failing to perform 
the write within rpc_timeout . This causes the coordinator of the write to 
raise 
the TimedOutException. 
 You options are:
 
 * allocate more memory
 * ease back on the stress test. 
 * work as a CL QUORUM so that one node failing does result in the error. 
 
 see also http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts
 
 
 Cheers
  
 
 
 
 
 
 -
 Aaron Morton
 Freelance Developer
  at aaronmorton
 http://www.thelastpickle.com
 
 
 
 On 28/05/2012, at 12:59 PM, Jason Tang wrote:
 Hi
 My system is 4 nodes 64 bit cassandra cluster, 6G big per node,default 
configuration (which means 1/3 heap for memtable), replicate number 3, write 
all, read one.
 When I run stress load testing, I got this TimedOutException, and some 
operation failed, and all traffic hang for a while. 
 
 And when I have 1G memory 32 bit cassandra on standalone model, I didn't find 
so frequently Stop the world behavior.
 
 So I wonder what kind of operation will hang the cassandra system. 
 
 
 How to collect information for tuning.
 
 From the system log and document, I guess there are three type operations:
 1) Flush memtable when meet max size
 
 2) Compact SSTable (why?)
 3) Java GC
 
 system.log:
 
  INFO [main] 2012-05-25 16:12:17,054 ColumnFamilyStore.java (line 688) 
Enqueuing flush of Memtable-LocationInfo at 1229893321(53/66 serialized/live 
bytes, 2 ops)
  INFO [FlushWriter:1] 2012-05-25 16:12:17,054 Memtable.java (line 239) 
 Writing 
Memtable-LocationInfo at 1229893321(53/66 serialized/live bytes, 2 ops)
  INFO [FlushWriter:1] 2012-05-25 16:12:17,166 Memtable.java (line 275) 
Completed flushing /var/proclog/raw/cassandra/data/system/LocationInfo-hb-2-
Data.db (163 bytes)
 
 ...
 
  INFO [CompactionExecutor:441] 2012-05-28 08:02:55,345 CompactionTask.java 
(line 112) Compacting 
[SSTableReader(path='/var/proclog/raw/cassandra/data/myks/queue-hb-41-Data.db'),
 
SSTableReader(path='/var/proclog/raw/cassandra/data/
 myks /queue-hb-32-Data.db'), 
SSTableReader(path='/var/proclog/raw/cassandra/data/
 myks /queue-hb-37-Data.db'), 
SSTableReader(path='/var/proclog/raw/cassandra/data/
 myks /queue-hb-53-Data.db')]
 ...
 
 
  WARN [ScheduledTasks:1] 2012-05-28 08:02:26,619 GCInspector.java (line 146) 
Heap is 0.7993011015621736 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
  INFO [ScheduledTasks:1] 2012-05-28 08:02:54,980 GCInspector.java (line 123) 
GC for ConcurrentMarkSweep: 728 ms for 2 collections, 3594946600 used; max is 
6274678784
  INFO [ScheduledTasks:1] 2012-05-28 08:41:34,030 GCInspector.java (line 123) 
GC for ParNew: 1668 ms for 1 collections, 4171503448 used; max is 6274678784
  INFO [ScheduledTasks:1] 2012-05-28 08:41:48,978 GCInspector.java (line 123) 
GC for ParNew: 1087 ms for 1 collections, 2623067496 used; max is 6274678784
  INFO [ScheduledTasks:1] 2012-05-28 08:41:48,987 GCInspector.java (line 123) 
GC for ConcurrentMarkSweep: 3198 ms for 3 collections, 2623361280 used; max is 
6274678784
 
 
 
 Timeout Exception:
 
 Caused by: org.apache.cassandra.thrift.TimedOutException: null
         at 
org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:19
495) ~[na:na]
         at 
org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:10
35) ~[na:na]
         at 
org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1009) ~
[na:na]
         at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceIm
pl.java:95) ~[na:na]
         ... 64 common frames omitted
 
 
 BRs
 //Tang Weiqiang
 
 
 
 
 
 
 

Hi, I've been running into the same type of issue, but on a single machine with 
CL ONE. Also a custom insertion stress utility. What would I need to do to 
address the timeouts? By allocate more memory do you mean increase heap size in 
the environment conf file?

Thanks,
J.







wildcards as both ends

2012-06-20 Thread Sam Z J
Hi all

I'm wondering how or if it's possible to implement efficient wildcards at
both ends, e.g. *string*

I can think of a few options... please comment, thanks =D

- if I can get another equality constraint which narrows down potential
result set significantly, I can do a scan. I'm not sure how feasible this
is without benchmarks. Does any one know if I can scan couple hundreds
/ thousands in a 3 node replication factory=2 cluster quickly?

- for each string I have, index all the prefixes in a column family, e.g.
for string 'string', I'd have rows string, strin, stri, str, st, s, with
column values somehow pointing back as row keys. This almost blows up the
storage needed =/ (also, what do I do if I hit the 2billion row width
limit? is there a way to say 'insert into another row if the current one is
full'?)

thanks

-- 
Zhongshi (Sam) Jiang
sammyjiang...@gmail.com


Re: unsubscribe

2012-05-02 Thread J. Zach Richardson
Maybe we need an auto responder for emails that contain unsubscribe
On May 2, 2012, at 9:14 AM, Eric Evans wrote:

 On Tue, May 1, 2012 at 9:05 AM, Gmail matthewapet...@gmail.com wrote:
 unsubscribe
 
 http://qkme.me/35w46c
 
 -- 
 Eric Evans
 Acunu | http://www.acunu.com | @acunu



Cql 3 wide rows filter expressions in where clause

2012-04-20 Thread Nagaraj J
Hi

cql 3 for wide rows is very promising. I was wondering if there is support
for filtering wide rows by additional filter expressions in where clause
(columns other than those which are part of the composite). 

Ex.
suppose i have sparse cf 

create columnfamily scf( k ascii, o ascii, x ascii, y ascii, z ascii,
PRIMARY KEY(k, o));

is it possible to have a query

select * from scf where k=1 and x=2 and z=2 order by o ASC;

I tried this with 1.1-rc and it doesnt work as expected. Also looked at
cql_tests.py in https://issues.apache.org/jira/browse/CASSANDRA-2474  there
is no mention of this. 

Am i missing something here ?

Thanks in advance
Nagaraj 

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cql-3-wide-rows-filter-expressions-in-where-clause-tp7486344p7486344.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: solr query for string match in CQL

2012-04-12 Thread A J
Never mind.
Double quotes within the single quotes worked:
select * from solr where solr_query='body:sixty eight million nine
hundred forty three thousand four hundred twenty four';


On Thu, Apr 12, 2012 at 11:42 AM, A J s5a...@gmail.com wrote:
 What is the syntax for a string match in CQL for solr_query ?

 cqlsh:wiki select * from solr where solr_query='body:sixty eight
 million nine hundred forty three thousand four hundred twenty four';
 Request did not complete within rpc_timeout.

 url encoding just returns without retrieving the row present:
 cqlsh:wiki select count(*) from solr where
 solr_query='body:%22sixty%20eight%20million%20nine%20hundred%20forty%20three%20thousand%20four%20hundred%20twenty%20four%22'
 ;
  count
 ---
     0

 I have exactly one row matching this string that I can retrieve
 through direct solr query.


 Thanks.


Re: Max # of CFs

2012-03-21 Thread A J
I have increased index_interval. Will let you know if I see a difference.


My theory is that memtables are not getting flushed. If I manually
flush them, the heap consumption goes down drastically.

I think when memtable_total_space_in_mb is exceeded not enough
memtables are getting flushed. There are 5000 memtables (one for each
CF) but each memtable in itself is small. So flushing of one or two
memtable by Cassandra is not helping.

Question: How many memtables are flushed when
memtable_total_space_in_mb is exceeded ? Any way to flush all
memtables when the threshold is reached ?

Thanks.

On Wed, Mar 21, 2012 at 8:56 AM, Vitalii Tymchyshyn tiv...@gmail.com wrote:
 Hello.

 There is also a primary row index. It's space can be controlled with
 index_interval setting. Don't know if you can look for it's memory usage
 somewhere. If I where you, I'd take jmap tool and examine heap histogram
 first, heap dump second.

 Best regards, Vitalii Tymchyshyn

 20.03.12 18:12, A J написав(ла):

 I have both row cache and column cache disabled for all my CFs.

 cfstats says Bloom Filter Space Used: 1760 per CF. Assuming it is in
 bytes, it is total of about 9MB of bloom filter size for 5K CFs; which
 is not a lot.


 On Tue, Mar 20, 2012 at 11:09 AM, Vitalii Tymchyshyntiv...@gmail.com
  wrote:

 Hello.

  From my experience it's unwise to make many column families for same
 keys
 because you will have bloom filters and row indexes multiplied. If you
 have
 5000, you should expect your heap requirements multiplied by same factor.
 Also check your cache sizes. Default AFAIR is 10 keys per column
 family.

 20.03.12 16:05, A J написав(ла):

 ok, the last thread says that 1.0+ onwards, thousands of CFs should
 not be a problem.

 But I am finding that all the allocated heap memory is getting consumed.
 I started with 8GB heap and then on reading


 http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management
 realized that minimum of 1MB per memtable is used by the per-memtable
 arena allocator.
 So with 5K CFs, 5GB will be used just by arena allocators.

 But even on increasing the heap to 16GB, am finding that all the heap
 is getting consumed. Is there a different formula for heap calculation
 when you have thousands of CFs ?
 Any other configuration that I need to change ?

 Thanks.

 On Mon, Mar 19, 2012 at 10:35 AM, Alain RODRIGUEZarodr...@gmail.com
  wrote:

 This subject was already discussed, this may help you :


 http://markmail.org/message/6dybhww56bxvufzf#query:+page:1+mid:6dybhww56bxvufzf+state:results

 If you still got questions after reading this thread or some others
 about
 the same topic, do not hesitate asking again,

 Alain


 2012/3/19 A Js5a...@gmail.com

 How many Column Families are one too many for Cassandra ?
 I created a db with 5000 CFs (I can go into the reasons later) but the
 latency seems to be very erratic now. Not sure if it is because of the
 number of CFs.

 Thanks.





Re: Order rows numerically

2012-03-21 Thread A J
Yes, that is good enough for now. Thanks.

On Fri, Mar 16, 2012 at 6:49 PM, Watanabe Maki watanabe.m...@gmail.com wrote:
 How about to fill zeros before smaller digits?
 Ex. 0001, 0002, etc

 maki


 On 2012/03/17, at 6:29, A J s5a...@gmail.com wrote:

 If I define my rowkeys to be Integer
 (key_validation_class=IntegerType) , how can I order the rows
 numerically ?
 ByteOrderedPartitioner orders lexically and retrieval using get_range
 does not seem to make sense in order.

 If I were to change rowkey to be UTF8 (key_validation_class=UTF8Type),
 BOP still does not give numerical enough.
 For range of rowkey from 1 to 2, I get 1, 10,11.,2 (lexical ordering).

 Any workaround for this ?

 Thanks.


Re: Max # of CFs

2012-03-20 Thread A J
ok, the last thread says that 1.0+ onwards, thousands of CFs should
not be a problem.

But I am finding that all the allocated heap memory is getting consumed.
I started with 8GB heap and then on reading
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management
realized that minimum of 1MB per memtable is used by the per-memtable
arena allocator.
So with 5K CFs, 5GB will be used just by arena allocators.

But even on increasing the heap to 16GB, am finding that all the heap
is getting consumed. Is there a different formula for heap calculation
when you have thousands of CFs ?
Any other configuration that I need to change ?

Thanks.

On Mon, Mar 19, 2012 at 10:35 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:
 This subject was already discussed, this may help you :
 http://markmail.org/message/6dybhww56bxvufzf#query:+page:1+mid:6dybhww56bxvufzf+state:results

 If you still got questions after reading this thread or some others about
 the same topic, do not hesitate asking again,

 Alain


 2012/3/19 A J s5a...@gmail.com

 How many Column Families are one too many for Cassandra ?
 I created a db with 5000 CFs (I can go into the reasons later) but the
 latency seems to be very erratic now. Not sure if it is because of the
 number of CFs.

 Thanks.




Re: Max # of CFs

2012-03-20 Thread A J
I have both row cache and column cache disabled for all my CFs.

cfstats says Bloom Filter Space Used: 1760 per CF. Assuming it is in
bytes, it is total of about 9MB of bloom filter size for 5K CFs; which
is not a lot.


On Tue, Mar 20, 2012 at 11:09 AM, Vitalii Tymchyshyn tiv...@gmail.com wrote:
 Hello.

 From my experience it's unwise to make many column families for same keys
 because you will have bloom filters and row indexes multiplied. If you have
 5000, you should expect your heap requirements multiplied by same factor.
 Also check your cache sizes. Default AFAIR is 10 keys per column family.

 20.03.12 16:05, A J написав(ла):

 ok, the last thread says that 1.0+ onwards, thousands of CFs should
 not be a problem.

 But I am finding that all the allocated heap memory is getting consumed.
 I started with 8GB heap and then on reading

 http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management
 realized that minimum of 1MB per memtable is used by the per-memtable
 arena allocator.
 So with 5K CFs, 5GB will be used just by arena allocators.

 But even on increasing the heap to 16GB, am finding that all the heap
 is getting consumed. Is there a different formula for heap calculation
 when you have thousands of CFs ?
 Any other configuration that I need to change ?

 Thanks.

 On Mon, Mar 19, 2012 at 10:35 AM, Alain RODRIGUEZarodr...@gmail.com
  wrote:

 This subject was already discussed, this may help you :

 http://markmail.org/message/6dybhww56bxvufzf#query:+page:1+mid:6dybhww56bxvufzf+state:results

 If you still got questions after reading this thread or some others about
 the same topic, do not hesitate asking again,

 Alain


 2012/3/19 A Js5a...@gmail.com

 How many Column Families are one too many for Cassandra ?
 I created a db with 5000 CFs (I can go into the reasons later) but the
 latency seems to be very erratic now. Not sure if it is because of the
 number of CFs.

 Thanks.





Max # of CFs

2012-03-19 Thread A J
How many Column Families are one too many for Cassandra ?
I created a db with 5000 CFs (I can go into the reasons later) but the
latency seems to be very erratic now. Not sure if it is because of the
number of CFs.

Thanks.


Order rows numerically

2012-03-16 Thread A J
If I define my rowkeys to be Integer
(key_validation_class=IntegerType) , how can I order the rows
numerically ?
ByteOrderedPartitioner orders lexically and retrieval using get_range
does not seem to make sense in order.

If I were to change rowkey to be UTF8 (key_validation_class=UTF8Type),
BOP still does not give numerical enough.
For range of rowkey from 1 to 2, I get 1, 10,11.,2 (lexical ordering).

Any workaround for this ?

Thanks.


Re: Does the 'batch' order matter ?

2012-03-14 Thread A J
 No, batch_mutate() is an atomic operation.  When a node locally applies a 
 batch mutation, either all of the changes are applied or none of them are.
The steps in my batch are not confined to a single CF, nor to a single key.

The documentation says:
datastax:
Column updates are only considered atomic within a given record (row).

Pycassa.batch:
This interface does not implement atomic operations across column
families. All the limitations of the batch_mutate Thrift API call
applies. Remember, a mutation in Cassandra is always atomic per key
per column family only.


On Wed, Mar 14, 2012 at 4:15 PM, Tyler Hobbs ty...@datastax.com wrote:
 On Wed, Mar 14, 2012 at 11:50 AM, A J s5a...@gmail.com wrote:


 Are you saying the way 'batch mutate' is coded, the order of writes in
 the batch does not mean anything ? You can ask the batch to do A,B,C
 and then D in sequence; but sometimes Cassandra can end up applying
 just C and A,B (and D) may still not be applied ?


 No, batch_mutate() is an atomic operation.  When a node locally applies a
 batch mutation, either all of the changes are applied or none of them are.

 Aaron was referring to the possibility that one of the replicas received the
 batch_mutate, but the other replicas did not.

 --
 Tyler Hobbs
 DataStax



Does the 'batch' order matter ?

2012-03-13 Thread A J
I know batch operations are not atomic but does the success of a write
imply all writes preceeding it in the batch were successful ?

For example, using cql:
BEGIN BATCH USING CONSISTENCY QUORUM AND TTL 864
  INSERT INTO users (KEY, password, name) VALUES ('user2',
'ch@ngem3b', 'second user')
  UPDATE users SET password = 'ps22dhds' WHERE KEY = 'user2'
  INSERT INTO users (KEY, password) VALUES ('user3', 'ch@ngem3c')
  DELETE name FROM users WHERE key = 'user2'
  INSERT INTO users (KEY, password, name) VALUES ('user4',
'ch@ngem3c', 'Andrew')
APPLY BATCH;

Say the batch failed but I see that the third write was present on a
node. Does it imply that the first insert and the second update
definitely made to that node as well ?

Thanks.


Test Data creation in Cassandra

2012-03-02 Thread A J
What is the best way to create millions of test data in Cassandra ?

I would like to have some script where I first insert say 100 rows in
a CF. Then reinsert the same data on 'server side' with new unique
key. That will make it 200 rows. Then continue the exercise a few
times till I get lot of records.
I don't care if the column names and values are identical between the
different rows. Just a lot of records generated for a few seed
records.

The rows are very fat. So I don't want to use any client side
scripting that would push individual or batched rows to cassandra.

Thanks for any tips.


Logging 'write' operations

2012-02-21 Thread A J
Hello,
What is the best way to log write operations (insert,remove, counter
add, batch operations) in Cassandra. I need to store the operations
(with values being passed) in some fashion or the other for audit
purposes (and possibly to undo some operation after inspection).

Thanks.


batch mode and flushing

2012-02-02 Thread A J
Hello, when you set 'commitlog_sync: batch' on all the nodes in a
multi-DC cluster and call writes with CL=ALL, does the operation wait
till the write is flushed to all the disks on all the nodes ?

Thanks.


Command to display config values

2012-01-24 Thread A J
Is there a command in cqlsh or cassandra CLI that can display the
various values of the configuration parameters at use.
I am particularly interested in finding the value of ' commitlog_sync'
that the current session is using ?

Thanks.
AJ


Re: Command to display config values

2012-01-24 Thread A J
Yes, I can see the yaml files. But I need to confirm through some
database query that the change in yaml on node restart was picked up
by the database.

On Tue, Jan 24, 2012 at 7:07 PM, aaron morton aa...@thelastpickle.com wrote:
 Nothing through those API's, can you check the yaml file ?

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 25/01/2012, at 10:10 AM, A J wrote:

 Is there a command in cqlsh or cassandra CLI that can display the
 various values of the configuration parameters at use.
 I am particularly interested in finding the value of ' commitlog_sync'
 that the current session is using ?

 Thanks.
 AJ




Encryption related question

2012-01-20 Thread A J
Hello,
I am trying to use internode encryption in Cassandra (1.0.6) for the first time.

1. Followed the steps 1 to 5 at
http://download.oracle.com/javase/6/docs/technotes/guides/security/jsse/JSSERefGuide.html#CreateKeystore
Q. In cassandra.yaml , what value goes for keystore ? I exported the
certificate per step #3 above in duke.cer. Do I put the location and
name of that file for this parameter ?
Siminarly, what value goes for truststore ? The steps 1-5 don't
indicate any other file to be exported that would possibly go here.

Also do I need to follow these steps on each of the node ?

Thanks
AJ


Restart for change of endpoint_snitch ?

2011-12-27 Thread A J
If I change endpoint_snitch from SimpleSnitch to PropertyFileSnitch,
does it require restart of cassandra on that node ?

Thanks.


java thrift error

2011-12-20 Thread A J
The following syntax :
import org.apache.cassandra.thrift.*;
.
.
ColumnOrSuperColumn col = client.get(count_key.getBytes(UTF-8),
cp, ConsistencyLevel.QUORUM);


is giving the error:
get(java.nio.ByteBuffer,org.apache.cassandra.thrift.ColumnPath,org.apache.cassandra.thrift.ConsistencyLevel)
in org.apache.cassandra.thrift.Cassandra.Client cannot be applied to
(byte[],org.apache.cassandra.thrift.ColumnPath,org.apache.cassandra.thrift.ConsistencyLevel)


Any idea on how to cast?

Thanks.


Re: setStrategy_options syntax in thrift

2011-12-20 Thread A J
I am new to java. Can you specify the exact syntax for replication_factor=2 ?

Thanks.

On Tue, Dec 20, 2011 at 1:50 PM, aaron morton aa...@thelastpickle.com wrote:
 It looks like you tried to pass the string {replication_factor:2}

 You need to pas a MapString, String type , where the the key is the option
 and the value is the option value.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 20/12/2011, at 12:02 PM, A J wrote:

 What is the syntax of setStrategy_options in thrift.

 The following fails:

 Util.java:22:
 setStrategy_options(java.util.Mapjava.lang.String,java.lang.String)
 in org.apache.cassandra.thrift.KsDef cannot be applied to
 (java.lang.String)
    newKs.setStrategy_options({replication_factor:2});




Re: java thrift error

2011-12-20 Thread A J
The following worked:
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.Charset;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.CharsetEncoder;
import java.nio.charset.CodingErrorAction;
.
public static Charset charset = Charset.forName(UTF-8);
public static CharsetEncoder encoder = charset.newEncoder();
public static CharsetDecoder decoder = charset.newDecoder();

public static ByteBuffer str_to_bb(String msg){
  try{
return encoder.encode(CharBuffer.wrap(msg));
  }catch(Exception e){e.printStackTrace();}
  return null;
}

and then instead of count_key.getBytes(UTF-8)
do
str_to_bb(count_key)

On Tue, Dec 20, 2011 at 4:03 PM, Dave Brosius dbros...@mebigfatguy.comwrote:

 A ByteBuffer is not a byte[] to convert a String to a ByteBuffer do
 something like

 public static ByteBuffer toByteBuffer(String value) throws 
 UnsupportedEncodingException{return 
 ByteBuffer.wrap(value.getBytes(UTF-8));}



 see http://wiki.apache.org/cassandra/ThriftExamples


 *- Original Message -*
 *From:* A J s5a...@gmail.com
 *Sent:* Tue, December 20, 2011 15:52
 *Subject:* java thrift error

 The following syntax :
 import org.apache.cassandra.thrift.*;
 .
 .
 ColumnOrSuperColumn col = client.get(count_key.getBytes(UTF-8),
 cp, ConsistencyLevel.QUORUM);


 is giving the error:
 get(java.nio.ByteBuffer,org.apache.cassandra.thrift.ColumnPath,org.apache.cassandra.thrift.ConsistencyLevel)
 in org.apache.cassandra.thrift.Cassandra.Client cannot be applied to
 (byte[],org.apache.cassandra.thrift.ColumnPath,org.apache.cassandra.thrift.ConsistencyLevel)


 Any idea on how to cast?

 Thanks.




Re: setStrategy_options syntax in thrift

2011-12-20 Thread A J
Thanks, that worked.

On Tue, Dec 20, 2011 at 4:08 PM, Dave Brosius dbros...@mebigfatguy.comwrote:


 KsDef ksDef = new KsDef();
 MapString, String options = new HashMapString, String();
 options.put(replication_factor, 2);
 ksDef.setStrategy_options(options);



 *- Original Message -*
 *From:* A J s5a...@gmail.com
 *Sent:* Tue, December 20, 2011 16:03
 *Subject:* Re: setStrategy_options syntax in thrift

 I am new to java. Can you specify the exact syntax for replication_factor=2 ?

 Thanks.

 On Tue, Dec 20, 2011 at 1:50 PM, aaron morton aa...@thelastpickle.com 
 wrote: It looks like you tried to pass the string {replication_factor:2} 
 You need to pas a MapString, String type , where the the key is the option 
 and the value is the option value. Cheers - Aaron 
 Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 
 20/12/2011, at 12:02 PM, A J wrote: What is the syntax of 
 setStrategy_options in thrift. The following fails: Util.java:22: 
 setStrategy_options(java.util.Mapjava.lang.String,java.lang.String) in 
 org.apache.cassandra.thrift.KsDef cannot be applied to (java.lang.String)   
  newKs.setStrategy_options({replication_factor:2});




setStrategy_options syntax in thrift

2011-12-19 Thread A J
What is the syntax of setStrategy_options in thrift.

The following fails:

Util.java:22: 
setStrategy_options(java.util.Mapjava.lang.String,java.lang.String)
in org.apache.cassandra.thrift.KsDef cannot be applied to
(java.lang.String)
newKs.setStrategy_options({replication_factor:2});


each_quorum in pycassa

2011-12-12 Thread A J
What is the syntax for each_quorum in pycassa ?

Thanks.


Increase replication factor

2011-12-05 Thread A J
If I update a keyspace to increase the replication factor; what
happens to existing data for that keyspace ? Does the existing data
get automatically increase its replication ? Or only on a RR or node
repair does the existing data increase its replication factor ?

Thanks.


garbage collecting tombstones

2011-12-01 Thread A J
Hello,
Is 'garbage collecting tombstones ' a different operation than the JVM GC.
Garbage collecting tombstones is controlled by gc_grace_seconds which
by default is set to 10 days. But the traditional GC seems to happen
much more frequently (when observed through jconsole) ?

How can I force the garbage collecting tombstones to happen ad-hoc
when I want to ?

Thanks.


Re: (A or B) AND C AND !D

2011-11-15 Thread A J
To clarify, I wish to keep N=4 and W=2 in the following scenario.

Thanks.

On Sun, Nov 13, 2011 at 11:20 PM, A J s5a...@gmail.com wrote:
 Hello
 Say I have 4 nodes: A, B, C and D and wish to have consistency level
 for writes defined in such as way that writes meet the following
 consistency level:
 (A or B) AND C AND !D,
 i.e. either of A or B will suffice and C to be included into
 consistency level as well. But the write should not wait for D.

 Is such a configuration possible ?
 I tried various combinations of EACH_QUORUM and LOCAL_QUORUM and
 clubbing the nodes in different DCs but could not really come up with
 a solution. Maybe I am missing something.

 Thanks.



Continuous export of data out of database

2011-11-15 Thread A J
Hello
VoltDB has an export feature to stream the data out of the database.
http://voltdb.com/company/blog/voltdb-export-connecting-voltdb-other-systems

This is different from Cassandra's export feature
(http://wiki.apache.org/cassandra/Operations#Import_.2BAC8_export)
which is more of a different way of snapshotting.

My question is : is streaming data out on a continuous basis (as in
VoltDB) possible in some fashion in Cassandra ?

Thanks
Bala


Re: Continuous export of data out of database

2011-11-15 Thread A J
The issue with that is that I wish to have EACH_QUORUM in our other 2
datacenters but not in the third DC.
Could not figure a way to accomplish that so exploring have a
near-realtime backup copy in the third DC via some streaming process.

On Tue, Nov 15, 2011 at 12:12 PM, Robert Jackson
robe...@promedicalinc.com wrote:
 The thing that I thought if initially would be setting up your cluster in a
 multi-datacenter config[1].  In that scenario you could add an additional
 machine in a second datacenter with RF=1.  We are using a variant of this
 setup to separate long running calculations from our interactive systems.
 [1] -
 http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers

 Robert Jackson
 


(A or B) AND C AND !D

2011-11-13 Thread A J
Hello
Say I have 4 nodes: A, B, C and D and wish to have consistency level
for writes defined in such as way that writes meet the following
consistency level:
(A or B) AND C AND !D,
i.e. either of A or B will suffice and C to be included into
consistency level as well. But the write should not wait for D.

Is such a configuration possible ?
I tried various combinations of EACH_QUORUM and LOCAL_QUORUM and
clubbing the nodes in different DCs but could not really come up with
a solution. Maybe I am missing something.

Thanks.


OOM

2011-11-02 Thread A J
Hi,

For a single node of cassandra(1.0 version) having 15G of data+index,
48GB RAM, 8GB heap and about 2.6G memtable threshold, I am getting OOM
when I have 1000 concurrent inserts happening at the same time.
I have kept concurrent_writes: 128 in cassandra.yaml as there are a
total of 16 cores (suggestion is to keep 8 * number_of_cores).

Can someone give pointers on what needs to be tuned.

Thanks, AJ.



ERROR 00:10:00,312 Fatal exception in thread Thread[Thread-3,5,main]
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:614)
at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:943)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1336)
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:105)
at 
org.apache.cassandra.thrift.CassandraDaemon$ThriftServer.run(CassandraDaemon.java:213)


Re: cassandra-cli describe / dump command

2011-09-02 Thread J T
Thats brilliant, thanks.

On Thu, Sep 1, 2011 at 7:07 PM, Jonathan Ellis jbel...@gmail.com wrote:

 yes, cli show schema in 0.8.4+

 On Thu, Sep 1, 2011 at 12:52 PM, J T jt4websi...@googlemail.com wrote:
  Hi,
 
  I'm probably being blind .. but I can't see any way to dump the schema
  definition (and the data in it for that matter)  of a cluster in order to
  capture the current schema in a script file for subsequent replaying in
 to a
  different environment.
 
  For example, say I have a DEV env and wanted to create a script
 containing
  the cli commands to create that schema in a UAT env.
 
  In my case, I have a cassandra schema I've been tweaking / upgrading over
  the last 2 years and I can't see any easy way to capture the schema
  definition.
 
  Is such a thing on the cards for cassandra-cli ?
 
  JT
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: Cassandra, CQL, Thrift Deprecation?? and Erlang

2011-09-02 Thread J T
Ok, thats good to know.

If push came to shove I could probably write such a client myself after
doing the necessary research but I'd prefer to save myself the hassle.

Thanks.

On Fri, Sep 2, 2011 at 1:59 PM, Jonathan Ellis jbel...@gmail.com wrote:

 The Thrift API is not going anywhere any time soon.

 I'm not aware of anyone working on an erlang CQL client.

 On Fri, Sep 2, 2011 at 7:39 AM, J T jt4websi...@googlemail.com wrote:
  Hi,
 
  I'm a fan of erlang, and have been using successive cassandra versions
 via
  the erlang thrift interface for a couple of years now.
 
  I see that cassandra seems to be moving to using CQL instead and so I was
  wondering if that means the thrift api will be deprecated and if so is
 there
  any effort underway to by anyone to create (whatever would be neccessary)
 to
  use cassandra via cql from erlang ?
 
  JT
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'

2011-07-12 Thread A J
From Cassandra the definitive guide - Basic Maintenance - Repair
Running nodetool repair causes Cassandra to execute a major compaction.
During a major compaction (see “Compaction” in the Glossary), the
server initiates a
TreeRequest/TreeReponse conversation to exchange Merkle trees with neighboring
nodes.

So is this text from the book misleading ?

On Fri, Jul 8, 2011 at 10:36 AM, Jonathan Ellis jbel...@gmail.com wrote:
 that's an internal term meaning background i/o, not sstable merging per se.

 On Fri, Jul 8, 2011 at 9:24 AM, A J s5a...@gmail.com wrote:
 I think node repair involves some compaction too. See the issue:
 https://issues.apache.org/jira/browse/CASSANDRA-2811
 It talks of 'validation compaction' being triggered concurrently
 during node repair.

 On Thu, Jun 30, 2011 at 8:51 PM, Watanabe Maki watanabe.m...@gmail.com 
 wrote:
 Repair doesn't compact. Those are different processes already.

 maki


 On 2011/07/01, at 7:21, A J s5a...@gmail.com wrote:

 Thanks all !
 In other words, I think it is safe to say that a node as a whole can
 be made consistent only on 'nodetool repair'.

 Has there been enough interest in providing anti-entropy without
 compaction as a separate operation (nodetool repair does both) ?


 On Thu, Jun 30, 2011 at 5:27 PM, Jonathan Ellis jbel...@gmail.com wrote:
 On Thu, Jun 30, 2011 at 3:47 PM, Edward Capriolo edlinuxg...@gmail.com 
 wrote:
 Read repair does NOT repair tombstones.

 It does, but you can't rely on RR to repair _all_ tombstones, because
 RR only happens if the row in question is requested by a client.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com






 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'

2011-07-12 Thread A J
Just confirming. Thanks for the clarification.

On Tue, Jul 12, 2011 at 10:53 AM, Peter Schuller
peter.schul...@infidyne.com wrote:
 From Cassandra the definitive guide - Basic Maintenance - Repair
 Running nodetool repair causes Cassandra to execute a major compaction.
 During a major compaction (see “Compaction” in the Glossary), the
 server initiates a
 TreeRequest/TreeReponse conversation to exchange Merkle trees with 
 neighboring
 nodes.

 So is this text from the book misleading ?

 It's just being a bit less specific (I suppose maybe misleading can be
 claimed). If you repair everything on a node, that will imply a
 validating compaction (i.e., do the read part of the compaction stage
 but don't merge to and write new sstables) which is expensive for the
 usual reasons with disk I/O; it's major since it covers all data.
 The data read is in fact used to calculate a merkle tree for
 comparison with neighbors, as claimed.

 --
 / Peter Schuller



Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'

2011-07-11 Thread A J
Instead of doing nodetool repair, is it not a cheaper operation to
keep tab of failed writes (be it deletes or inserts or updates) and
read these failed writes at a set frequency in some batch job ? By
reading them, RR would get triggered and they would get to a
consistent state.

Because these would targeted reads (only for those that failed during
writes), it should be a shorter list and quick to repair (than
nodetool repair).


On Thu, Jun 30, 2011 at 5:27 PM, Jonathan Ellis jbel...@gmail.com wrote:
 On Thu, Jun 30, 2011 at 3:47 PM, Edward Capriolo edlinuxg...@gmail.com 
 wrote:
 Read repair does NOT repair tombstones.

 It does, but you can't rely on RR to repair _all_ tombstones, because
 RR only happens if the row in question is requested by a client.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'

2011-07-11 Thread A J
Never mind. I see the issue with this. I will be able to catch the
writes as failed only if I set CL=ALL. For other CLs, I may not know
that it failed on some node.

On Mon, Jul 11, 2011 at 2:33 PM, A J s5a...@gmail.com wrote:
 Instead of doing nodetool repair, is it not a cheaper operation to
 keep tab of failed writes (be it deletes or inserts or updates) and
 read these failed writes at a set frequency in some batch job ? By
 reading them, RR would get triggered and they would get to a
 consistent state.

 Because these would targeted reads (only for those that failed during
 writes), it should be a shorter list and quick to repair (than
 nodetool repair).


 On Thu, Jun 30, 2011 at 5:27 PM, Jonathan Ellis jbel...@gmail.com wrote:
 On Thu, Jun 30, 2011 at 3:47 PM, Edward Capriolo edlinuxg...@gmail.com 
 wrote:
 Read repair does NOT repair tombstones.

 It does, but you can't rely on RR to repair _all_ tombstones, because
 RR only happens if the row in question is requested by a client.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




Node repair questions

2011-07-11 Thread A J
Hello,
Have the following questions related to nodetool repair:
1. I know that Nodetool Repair Interval has to be less than
GCGraceSeconds. How do I come up with an exact value of GCGraceSeconds
and 'Nodetool Repair Interval'. What factors would want me to change
the default of 10 days of GCGraceSeconds. Similarly what factors would
want me to keep Nodetool Repair Interval to be just slightly less than
GCGraceSeconds (say a day less).

2. Does a Nodetool Repair block any reads and writes on the node,
while the repair is going on ? During repair, if I try to do an
insert, will the insert wait for repair to complete first ?

3. I read that repair can impact your workload as it causes additional
disk and cpu activity. But any details of the impact mechanism and any
ballpark on how much the read/write performance deteriorates ?

Thanks.


When is 'Cassandra High Performance Cookbook' expected to be available ?

2011-07-07 Thread A J
https://www.packtpub.com/cassandra-apache-high-performance-cookbook/book


List nodes where write was applied to

2011-07-07 Thread A J
Is there a way to find what all nodes was a write applied to ? It
could be a successful write (i.e. w was met) or unsuccessful write
(i.e. less than w nodes were met). In either case, I am interested in
finding:
Number of nodes written to (before timeout or on success)
Name of nodes written to (before timeout or on success)

Thanks.


What does a write lock ?

2011-07-07 Thread A J
Does a write lock:
1. Just the columns in question for the specific row in question ?
2. The full row in question ?
3. The full CF ?

I doubt read does any locks.

Thanks.


'select * from cf' - FTS or Index

2011-07-07 Thread A J
Does a 'select * from cf'  with no filter still use the primary
index on the key or do a 'full table scan' ?

Thanks.


  1   2   >