RE: Seed gossip version error

2015-07-21 Thread DE VITO Dominique
Hi Amlan,

We have the same pb with Cassandra 2.1.5.

I have no hint (yet) to follow.

Did you found the root of this pb ?

Thanks.

Regards,
Dominique


[@@ THALES GROUP INTERNAL @@]

De : Amlan Roy [mailto:amlan@cleartrip.com]
Envoyé : mercredi 1 juillet 2015 12:46
À : user@cassandra.apache.org
Objet : Seed gossip version error

Hi,

I have a running cluster running with version 2.1.7. Two of the machines went 
down and they are not joining the cluster even after restart. I see the 
following WARN message in system.log in all the nodes:
system.log:WARN  
[MessagingService-Outgoing-cassandra2.cleartrip.com/172.18.3.32http://MessagingService-Outgoing-cassandra2.cleartrip.com/172.18.3.32]
 2015-07-01 13:00:41,878 OutboundTcpConnection.java:414 - Seed gossip version 
is -2147483648; will not connect with that version

Please let me know if you have faced the same problem.

Regards,
Amlan




Re: Schema questions for data structures with recently-modified access patterns

2015-07-21 Thread Robert Wille
The time series doesn’t provide the access pattern I’m looking for. No way to 
query recently-modified documents.

On Jul 21, 2015, at 9:13 AM, Carlos Alonso 
i...@mrcalonso.commailto:i...@mrcalonso.com wrote:

Hi Robert,

What about modelling it as a time serie?

CREATE TABLE document (
  docId UUID,
  doc TEXT,
  last_modified TIMESTAMP
  PRIMARY KEY(docId, last_modified)
) WITH CLUSTERING ORDER BY (last_modified DESC);

This way, you the lastest modification will always be the first record in the 
row, therefore accessing it should be as easy as:

SELECT * FROM document WHERE docId == the docId LIMIT 1;

And, if you experience diskspace issues due to very long rows, then you can 
always expire old ones using TTL or on a batch job. Tombstones will never be a 
problem in this case as, due to the specified clustering order, the latest 
modification will always be first record in the row.

Hope it helps.

Carlos Alonso | Software Engineer | @calonsohttps://twitter.com/calonso

On 21 July 2015 at 05:59, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
Data structures that have a recently-modified access pattern seem to be a poor 
fit for Cassandra. I’m wondering if any of you smart guys can provide 
suggestions.

For the sake of discussion, lets assume I have the following tables:

CREATE TABLE document (
docId UUID,
doc TEXT,
last_modified TIMEUUID,
PRIMARY KEY ((docid))
)

CREATE TABLE doc_by_last_modified (
date TEXT,
last_modified TIMEUUID,
docId UUID,
PRIMARY KEY ((date), last_modified)
)

When I update a document, I retrieve its last_modified time, delete the current 
record from doc_by_last_modified, and add a new one. Unfortunately, if you’d 
like each document to appear at most once in the doc_by_last_modified table, 
then this doesn’t work so well.

Documents can get into the doc_by_last_modified table multiple times if there 
is concurrent access, or if there is a consistency issue.

Any thoughts out there on how to efficiently provide recently-modified access 
to a table? This problem exists for many types of data structures, not just 
recently-modified. Any ordered data structure that can be dynamically reordered 
suffers from the same problems. As I’ve been doing schema design, this pattern 
keeps recurring. A nice way to address this problem has lots of applications.

Thanks in advance for your thoughts

Robert





Re: Schema questions for data structures with recently-modified access patterns

2015-07-21 Thread Carlos Alonso
Hi Robert,

What about modelling it as a time serie?

CREATE TABLE document (
  docId UUID,
  doc TEXT,
  last_modified TIMESTAMP
  PRIMARY KEY(docId, last_modified)
) WITH CLUSTERING ORDER BY (last_modified DESC);

This way, you the lastest modification will always be the first record in
the row, therefore accessing it should be as easy as:

SELECT * FROM document WHERE docId == the docId LIMIT 1;

And, if you experience diskspace issues due to very long rows, then you can
always expire old ones using TTL or on a batch job. Tombstones will never
be a problem in this case as, due to the specified clustering order, the
latest modification will always be first record in the row.

Hope it helps.

Carlos Alonso | Software Engineer | @calonso https://twitter.com/calonso

On 21 July 2015 at 05:59, Robert Wille rwi...@fold3.com wrote:

 Data structures that have a recently-modified access pattern seem to be a
 poor fit for Cassandra. I’m wondering if any of you smart guys can provide
 suggestions.

 For the sake of discussion, lets assume I have the following tables:

 CREATE TABLE document (
 docId UUID,
 doc TEXT,
 last_modified TIMEUUID,
 PRIMARY KEY ((docid))
 )

 CREATE TABLE doc_by_last_modified (
 date TEXT,
 last_modified TIMEUUID,
 docId UUID,
 PRIMARY KEY ((date), last_modified)
 )

 When I update a document, I retrieve its last_modified time, delete the
 current record from doc_by_last_modified, and add a new one. Unfortunately,
 if you’d like each document to appear at most once in the
 doc_by_last_modified table, then this doesn’t work so well.

 Documents can get into the doc_by_last_modified table multiple times if
 there is concurrent access, or if there is a consistency issue.

 Any thoughts out there on how to efficiently provide recently-modified
 access to a table? This problem exists for many types of data structures,
 not just recently-modified. Any ordered data structure that can be
 dynamically reordered suffers from the same problems. As I’ve been doing
 schema design, this pattern keeps recurring. A nice way to address this
 problem has lots of applications.

 Thanks in advance for your thoughts

 Robert




Re: Schema questions for data structures with recently-modified access patterns

2015-07-21 Thread Jack Krupansky
Keep the original document base table, but then the query table should have
the PK as last_modified, docId, with last_modified descending, so that a
query can get the n most recently modified documents.

Yes, you still need to manually delete the old entry for the document in
the query table if duplicates are a problem for you.

Yeah, a TTL would be good if you don't care about documents modified a
month or a week ago.

-- Jack Krupansky

On Tue, Jul 21, 2015 at 11:13 AM, Carlos Alonso i...@mrcalonso.com wrote:

 Hi Robert,

 What about modelling it as a time serie?

 CREATE TABLE document (
   docId UUID,
   doc TEXT,
   last_modified TIMESTAMP
   PRIMARY KEY(docId, last_modified)
 ) WITH CLUSTERING ORDER BY (last_modified DESC);

 This way, you the lastest modification will always be the first record in
 the row, therefore accessing it should be as easy as:

 SELECT * FROM document WHERE docId == the docId LIMIT 1;

 And, if you experience diskspace issues due to very long rows, then you
 can always expire old ones using TTL or on a batch job. Tombstones will
 never be a problem in this case as, due to the specified clustering order,
 the latest modification will always be first record in the row.

 Hope it helps.

 Carlos Alonso | Software Engineer | @calonso https://twitter.com/calonso

 On 21 July 2015 at 05:59, Robert Wille rwi...@fold3.com wrote:

 Data structures that have a recently-modified access pattern seem to be a
 poor fit for Cassandra. I’m wondering if any of you smart guys can provide
 suggestions.

 For the sake of discussion, lets assume I have the following tables:

 CREATE TABLE document (
 docId UUID,
 doc TEXT,
 last_modified TIMEUUID,
 PRIMARY KEY ((docid))
 )

 CREATE TABLE doc_by_last_modified (
 date TEXT,
 last_modified TIMEUUID,
 docId UUID,
 PRIMARY KEY ((date), last_modified)
 )

 When I update a document, I retrieve its last_modified time, delete the
 current record from doc_by_last_modified, and add a new one. Unfortunately,
 if you’d like each document to appear at most once in the
 doc_by_last_modified table, then this doesn’t work so well.

 Documents can get into the doc_by_last_modified table multiple times if
 there is concurrent access, or if there is a consistency issue.

 Any thoughts out there on how to efficiently provide recently-modified
 access to a table? This problem exists for many types of data structures,
 not just recently-modified. Any ordered data structure that can be
 dynamically reordered suffers from the same problems. As I’ve been doing
 schema design, this pattern keeps recurring. A nice way to address this
 problem has lots of applications.

 Thanks in advance for your thoughts

 Robert





RE: Seed gossip version error

2015-07-21 Thread DE VITO Dominique
Thanks for your reply.

Yes, I am sure all nodes are running the same version.

On second thoughts, I think my gossip pb is due to intense GC activities, 
leading to be even not able to do a gossip handshake !

Regards,
Dominique


[@@ THALES GROUP INTERNAL @@]

De : Carlos Rolo [mailto:r...@pythian.com]
Envoyé : mardi 21 juillet 2015 18:33
À : user@cassandra.apache.org
Objet : Re: Seed gossip version error

That error should only occur when you have a mismatch between the Seed version 
and the new node version. Are you sure all your nodes are running in the same 
version?

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: 
linkedin.com/in/carlosjuzarterolohttp://linkedin.com/in/carlosjuzarterolo
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.comhttp://www.pythian.com/

On Tue, Jul 21, 2015 at 5:37 PM, DE VITO Dominique 
dominique.dev...@thalesgroup.commailto:dominique.dev...@thalesgroup.com 
wrote:
Hi Amlan,

We have the same pb with Cassandra 2.1.5.

I have no hint (yet) to follow.

Did you found the root of this pb ?

Thanks.

Regards,
Dominique


[@@ THALES GROUP INTERNAL @@]

De : Amlan Roy [mailto:amlan@cleartrip.commailto:amlan@cleartrip.com]
Envoyé : mercredi 1 juillet 2015 12:46
À : user@cassandra.apache.orgmailto:user@cassandra.apache.org
Objet : Seed gossip version error

Hi,

I have a running cluster running with version 2.1.7. Two of the machines went 
down and they are not joining the cluster even after restart. I see the 
following WARN message in system.log in all the nodes:
system.log:WARN  
[MessagingService-Outgoing-cassandra2.cleartrip.com/172.18.3.32http://MessagingService-Outgoing-cassandra2.cleartrip.com/172.18.3.32]
 2015-07-01 13:00:41,878 OutboundTcpConnection.java:414 - Seed gossip version 
is -2147483648; will not connect with that version

Please let me know if you have faced the same problem.

Regards,
Amlan





--




Re: Seed gossip version error

2015-07-21 Thread Carlos Rolo
That error should only occur when you have a mismatch between the Seed
version and the new node version. Are you sure all your nodes are running
in the same version?

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
http://linkedin.com/in/carlosjuzarterolo*
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Tue, Jul 21, 2015 at 5:37 PM, DE VITO Dominique 
dominique.dev...@thalesgroup.com wrote:

 Hi Amlan,



 We have the same pb with Cassandra 2.1.5.



 I have no hint (yet) to follow.



 Did you found the root of this pb ?



 Thanks.



 Regards,

 Dominique





 [@@ THALES GROUP INTERNAL @@]



 *De :* Amlan Roy [mailto:amlan@cleartrip.com]
 *Envoyé :* mercredi 1 juillet 2015 12:46
 *À :* user@cassandra.apache.org
 *Objet :* Seed gossip version error



 Hi,



 I have a running cluster running with version 2.1.7. Two of the machines
 went down and they are not joining the cluster even after restart. I see
 the following WARN message in system.log in all the nodes:

 system.log:WARN  [
 MessagingService-Outgoing-cassandra2.cleartrip.com/172.18.3.32]
 2015-07-01 13:00:41,878 OutboundTcpConnection.java:414 - Seed gossip
 version is -2147483648; will not connect with that version



 Please let me know if you have faced the same problem.



 Regards,

 Amlan






-- 


--





Re: Schema questions for data structures with recently-modified access patterns

2015-07-21 Thread Robert Wille
If last_modified is a clustering column, it needs a partitioning column, which 
is what date is for (although I should have named it day, and I also forgot to 
add the order by desc clause). This is essentially what I came up with. Still 
not liking how easy it is to get duplicates.

On Jul 21, 2015, at 9:31 AM, Jack Krupansky 
jack.krupan...@gmail.commailto:jack.krupan...@gmail.com wrote:

Keep the original document base table, but then the query table should have the 
PK as last_modified, docId, with last_modified descending, so that a query can 
get the n most recently modified documents.

Yes, you still need to manually delete the old entry for the document in the 
query table if duplicates are a problem for you.

Yeah, a TTL would be good if you don't care about documents modified a month or 
a week ago.

-- Jack Krupansky

On Tue, Jul 21, 2015 at 11:13 AM, Carlos Alonso 
i...@mrcalonso.commailto:i...@mrcalonso.com wrote:
Hi Robert,

What about modelling it as a time serie?

CREATE TABLE document (
  docId UUID,
  doc TEXT,
  last_modified TIMESTAMP
  PRIMARY KEY(docId, last_modified)
) WITH CLUSTERING ORDER BY (last_modified DESC);

This way, you the lastest modification will always be the first record in the 
row, therefore accessing it should be as easy as:

SELECT * FROM document WHERE docId == the docId LIMIT 1;

And, if you experience diskspace issues due to very long rows, then you can 
always expire old ones using TTL or on a batch job. Tombstones will never be a 
problem in this case as, due to the specified clustering order, the latest 
modification will always be first record in the row.

Hope it helps.

Carlos Alonso | Software Engineer | @calonsohttps://twitter.com/calonso

On 21 July 2015 at 05:59, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
Data structures that have a recently-modified access pattern seem to be a poor 
fit for Cassandra. I’m wondering if any of you smart guys can provide 
suggestions.

For the sake of discussion, lets assume I have the following tables:

CREATE TABLE document (
docId UUID,
doc TEXT,
last_modified TIMEUUID,
PRIMARY KEY ((docid))
)

CREATE TABLE doc_by_last_modified (
date TEXT,
last_modified TIMEUUID,
docId UUID,
PRIMARY KEY ((date), last_modified)
)

When I update a document, I retrieve its last_modified time, delete the current 
record from doc_by_last_modified, and add a new one. Unfortunately, if you’d 
like each document to appear at most once in the doc_by_last_modified table, 
then this doesn’t work so well.

Documents can get into the doc_by_last_modified table multiple times if there 
is concurrent access, or if there is a consistency issue.

Any thoughts out there on how to efficiently provide recently-modified access 
to a table? This problem exists for many types of data structures, not just 
recently-modified. Any ordered data structure that can be dynamically reordered 
suffers from the same problems. As I’ve been doing schema design, this pattern 
keeps recurring. A nice way to address this problem has lots of applications.

Thanks in advance for your thoughts

Robert






howto do sql query like in a relational database

2015-07-21 Thread anton
Hi,

I have a simple (perhaps stupid) question.

If I want to *search* data in cassandra,
how could find in a text field all records
which start with 'Cas' 
( in sql I do select * from table where field like 'Cas%')

I know that this is not directly possible.

 - But how is it possible?

 - Do nobody have the need to search text fragments,
   and if not is there a small example to explain
   *why* this is not needed?

As far as I understand, databases are great for *searching*
data. Concerning numerical data in cassandra I can use   =
all that operators.

Is cassandra intended to be used for mostly numerical data?

I did not catch the point up to now, sorry.

 Anton




RE: howto do sql query like in a relational database

2015-07-21 Thread Peer, Oded
Cassandra is a highly scalable, eventually consistent, distributed, structured 
key-value store http://wiki.apache.org/cassandra/
It is intended for searching by key. It has more querying options but it really 
shines when querying by key.

Not all databases offer the same functionality. Both a knife and a fork are 
eating utensils, but you wouldn't want to cut a tomato with a fork.
There are text-indexing databases out there that might suit your needs better. 
Try elasticsearch.

-Original Message-
From: anton [mailto:anto...@gmx.de] 
Sent: Tuesday, July 21, 2015 7:54 PM
To: user@cassandra.apache.org
Subject: howto do sql query like in a relational database

Hi,

I have a simple (perhaps stupid) question.

If I want to *search* data in cassandra, how could find in a text field all 
records which start with 'Cas' 
( in sql I do select * from table where field like 'Cas%')

I know that this is not directly possible.

 - But how is it possible?

 - Do nobody have the need to search text fragments,
   and if not is there a small example to explain
   *why* this is not needed?

As far as I understand, databases are great for *searching* data. Concerning 
numerical data in cassandra I can use   = all that operators.

Is cassandra intended to be used for mostly numerical data?

I did not catch the point up to now, sorry.

 Anton




Re: Can't connect to Cassandra server

2015-07-21 Thread Chamila Wijayarathna
Hi Erick,

In cassandra-env.sh,  system_memory_in_mb was set to 2GB, I changed it into
16GB, but I still get the same issue. Following are my complete system.log
after changing cassandra-env.sh, and new cassandra-env.sh.

https://gist.githubusercontent.com/cdwijayarathna/5e7e69c62ac09b45490b/raw/f73f043a6cd68eb5e7f93cf597ec514df7ac61ae/log
https://gist.github.com/cdwijayarathna/2665814a9bd3c47ba650

I can't find ant output.log in my cassandra installation.

Thanks

On Tue, Jul 21, 2015 at 4:31 AM, Erick Ramirez er...@ramirez.com.au wrote:

 Chamila,

 As you can see from the netstat/lsof output, there is nothing listening on
 port 9042 because Cassandra has not started yet. This is the reason you are
 unable to connect via cqlsh.

 You need to work out first why Cassandra has not started.

 With regards to JVM, Oded is referring to the max heap size and new heap
 size you have configured. The suspicion is that you have max heap size set
 too low which is apparent from the heap pressure and GC pattern in the log
 you provided.

 Please provide the gist for the following so we can assist:
 - updated system.log
 - copy of output.log
 - cassandra-env.sh

 Cheers,
 Erick

 *Erick Ramirez*
 About Me about.me/erickramirezonline




-- 
*Chamila Dilshan Wijayarathna,*
Software Engineer
Mobile:(+94)788193620
WSO2 Inc., http://wso2.com/


Re: DateTieredCompactionStrategy DTCS sometimes stop dropping SSTables

2015-07-21 Thread Robert Coli
On Mon, Jul 20, 2015 at 6:20 PM, Christophe Schmitz 
christo...@instaclustr.com wrote:

 I am running a 6 node cluster on 2.1.7 ...


Sounds similar to :
https://issues.apache.org/jira/browse/CASSANDRA-9577 or maybe
https://issues.apache.org/jira/browse/CASSANDRA-9056 or
https://issues.apache.org/jira/browse/CASSANDRA-8243

The latter two should both be fixed by 2.1.7...

=Rob


Re: High CPU load

2015-07-21 Thread Marcin Pietraszek
Yup... it seems like it's gc fault

gc logs

2015-07-21T14:19:54.336+: 2876133.270: Total time for which
application threads were stopped: 0.0832030 seconds
2015-07-21T14:19:55.739+: 2876134.673: Total time for which
application threads were stopped: 0.0806960 seconds
2015-07-21T14:19:57.149+: 2876136.083: Total time for which
application threads were stopped: 0.0806890 seconds
2015-07-21T14:19:58.550+: 2876137.484: Total time for which
application threads were stopped: 0.0821070 seconds
2015-07-21T14:19:59.941+: 2876138.875: Total time for which
application threads were stopped: 0.0802640 seconds
2015-07-21T14:20:01.340+: 2876140.274: Total time for which
application threads were stopped: 0.0835670 seconds
2015-07-21T14:20:02.744+: 2876141.678: Total time for which
application threads were stopped: 0.0842440 seconds
2015-07-21T14:20:04.143+: 2876143.077: Total time for which
application threads were stopped: 0.0841630 seconds
2015-07-21T14:20:05.541+: 2876144.475: Total time for which
application threads were stopped: 0.0839850 seconds

Heap after GC invocations=2273737 (full 101):
 par new generation   total 1474560K, used 106131K
[0x0005fae0, 0x00065ee0, 0x00065ee0)
  eden space 1310720K,   0% used [0x0005fae0,
0x0005fae0, 0x00064ae0)
  from space 163840K,  64% used [0x00064ae0,
0x0006515a4ee0, 0x000654e0)
  to   space 163840K,   0% used [0x000654e0,
0x000654e0, 0x00065ee0)
 concurrent mark-sweep generation total 6750208K, used 1316691K
[0x00065ee0, 0x0007fae0, 0x0007fae0)
 concurrent-mark-sweep perm gen total 49336K, used 29520K
[0x0007fae0, 0x0007fde2e000, 0x0008)
}
2015-07-21T14:12:05.683+: 2875664.617: Total time for which
application threads were stopped: 0.0830280 seconds
{Heap before GC invocations=2273737 (full 101):
 par new generation   total 1474560K, used 1416851K
[0x0005fae0, 0x00065ee0, 0x00065ee0)
  eden space 1310720K, 100% used [0x0005fae0,
0x00064ae0, 0x00064ae0)
  from space 163840K,  64% used [0x00064ae0,
0x0006515a4ee0, 0x000654e0)
  to   space 163840K,   0% used [0x000654e0,
0x000654e0, 0x00065ee0)
 concurrent mark-sweep generation total 6750208K, used 1316691K
[0x00065ee0, 0x0007fae0, 0x0007fae0)
 concurrent-mark-sweep perm gen total 49336K, used 29520K
[0x0007fae0, 0x0007fde2e000, 0x0008)

It seems like eden heap space is being constantly occupied by
something which is later removed by gc...


On Mon, Jul 20, 2015 at 9:18 AM, Jason Wee peich...@gmail.com wrote:
 just a guess, gc?

 On Mon, Jul 20, 2015 at 3:15 PM, Marcin Pietraszek mpietras...@opera.com
 wrote:

 Hello!

 I've noticed a strange CPU utilisation patterns on machines in our
 cluster. After C* daemon restart it behaves in a normal way, after a
 few weeks since a restart CPU usage starts to raise. Currently on one
 of the nodes (screenshots attached) cpu load is ~4. Shortly before
 restart load raises to ~15 (our cassandra machines have 16 cpus).

 In that cluster we're using bulkloading from hadoop cluster with 1400
 reducers (200 parallel bulkloading tasks). After such session of heavy
 bulkloading number of pending compactions is quite high but it's able
 to clear them before next bulkloading session. We're also tracking
 number of pending compactions and during most of the time it's 0.

 On our machines we do have a few gigs of free memory ~7GB (17GB used),
 also it seems like we aren't IO bound.

 Screenshots from our zabbix with CPU utilisation graphs:

 http://i60.tinypic.com/xas8q8.jpg
 http://i58.tinypic.com/24pifcy.jpg

 Do you guys know what could be causing such high load?

 --
 mp




Re: Schema questions for data structures with recently-modified access patterns

2015-07-21 Thread Victor
I'm relatively new to data modeling in Cassandra, but perhaps instead of
date and last_modified in your primary key for doc_by_last_modified, just
use the docId. This way, you are can update the last_modified and date
fields against the docId and it removes the duplicate issue and obviates
the need to delete the current row or adding a new one-- you'd simply be
updating (upserting?) by the docId 

Regards,
Victor

On Mon, Jul 20, 2015 at 11:59 PM, Robert Wille rwi...@fold3.com wrote:

 Data structures that have a recently-modified access pattern seem to be a
 poor fit for Cassandra. I’m wondering if any of you smart guys can provide
 suggestions.

 For the sake of discussion, lets assume I have the following tables:

 CREATE TABLE document (
 docId UUID,
 doc TEXT,
 last_modified TIMEUUID,
 PRIMARY KEY ((docid))
 )

 CREATE TABLE doc_by_last_modified (
 date TEXT,
 last_modified TIMEUUID,
 docId UUID,
 PRIMARY KEY ((date), last_modified)
 )

 When I update a document, I retrieve its last_modified time, delete the
 current record from doc_by_last_modified, and add a new one. Unfortunately,
 if you’d like each document to appear at most once in the
 doc_by_last_modified table, then this doesn’t work so well.

 Documents can get into the doc_by_last_modified table multiple times if
 there is concurrent access, or if there is a consistency issue.

 Any thoughts out there on how to efficiently provide recently-modified
 access to a table? This problem exists for many types of data structures,
 not just recently-modified. Any ordered data structure that can be
 dynamically reordered suffers from the same problems. As I’ve been doing
 schema design, this pattern keeps recurring. A nice way to address this
 problem has lots of applications.

 Thanks in advance for your thoughts

 Robert