[Ann] Cassandra Interpreter for Zeppelin

2015-07-22 Thread DuyHai Doan
Hello

 I'm pleased to announce a Cassandra interpreter for Apache Zepplin. For
those who don't know, Apache Zeppelin[1] is a web-based notebook that
enables interactive data analytics. It is similar to IPython/Jupyter but is
JVM-based and its architecture is modular enough to allow various
back-ends(Spark, HBase, Lens, ...) to be plugged in.

 This Cassandra-Zeppelin integration is not meant to replace a complete
suite like Tableau Software, QlikView or similar but it offers at least
some user-friendly web-based interface for interactive data visualization.

 The Zeppelin project and community are young but very promising. They plan
to add more graph capabilities and features in the future.

The JIRA has been created here[2]. If you're interested to play with it,
please vote on the JIRA so that the pull request can be merged quickly.
Feedbacks are also welcomed.

A brief description of what can be done with this interpreter:

- support single-line and multi-line comments
- one CQL statement can span many line
- a @prefix system to pass in runtime parameters to queries
- support for preparing statements before-hand and injecting bound values
to prepared statements
- parallel execution of each paragraphs
- the last statement is displayed as tabular data if it is a SELECT
statement. For non SELECT statements, execution statistics are returned
- simple syntax validation by the interpreter, CQL syntax validation is
delegated to Cassandra
- support for Zeppelin dynamic form with the mustache syntax {
{input_name=default value}} or { {select_name=val1 | val2 | ... | valN}}

Detailed documentation and build instructions for the interpreter can be
found here[3]

[1]: http://zeppelin.incubator.apache.org/
[2]: https://issues.apache.org/jira/browse/ZEPPELIN-179
[3]:
https://docs.google.com/document/d/1krRrpZ3jKx_EOnALp30R1aAL8_tqCiu3W9oz5og0hDg/pub


 Regards

Duy Hai DOAN


RE: Can't connect to Cassandra server

2015-07-22 Thread Peer, Oded
Setting system_memory_in_mb to 16 GB means the Cassandra heap size you are 
using is 4 GB.
If you meant to use a 16GB heap you should uncomment the line
#MAX_HEAP_SIZE=4G
And set
MAX_HEAP_SIZE=16G

You should uncomment the HEAP_NEWSIZE setting as well. I would leave it with 
the default setting 800M until you are certain it needs to be changed.


From: Chamila Wijayarathna [mailto:cdwijayarat...@gmail.com]
Sent: Tuesday, July 21, 2015 9:21 PM
To: Erick Ramirez
Cc: user@cassandra.apache.org
Subject: Re: Can't connect to Cassandra server

Hi Erick,

In cassandra-env.sh,  system_memory_in_mb was set to 2GB, I changed it into 
16GB, but I still get the same issue. Following are my complete system.log 
after changing cassandra-env.sh, and new cassandra-env.sh.

https://gist.githubusercontent.com/cdwijayarathna/5e7e69c62ac09b45490b/raw/f73f043a6cd68eb5e7f93cf597ec514df7ac61ae/log
https://gist.github.com/cdwijayarathna/2665814a9bd3c47ba650

I can't find ant output.log in my cassandra installation.

Thanks

On Tue, Jul 21, 2015 at 4:31 AM, Erick Ramirez 
er...@ramirez.com.aumailto:er...@ramirez.com.au wrote:
Chamila,

As you can see from the netstat/lsof output, there is nothing listening on port 
9042 because Cassandra has not started yet. This is the reason you are unable 
to connect via cqlsh.

You need to work out first why Cassandra has not started.

With regards to JVM, Oded is referring to the max heap size and new heap size 
you have configured. The suspicion is that you have max heap size set too low 
which is apparent from the heap pressure and GC pattern in the log you provided.

Please provide the gist for the following so we can assist:
- updated system.log
- copy of output.log
- cassandra-env.sh

Cheers,
Erick

Erick Ramirez
About Me about.me/erickramirezonlinehttp://about.me/erickramirezonline




--
Chamila Dilshan Wijayarathna,
Software Engineer
Mobile:(+94)788193620
WSO2 Inc., http://wso2.com/



Re: howto do sql query like in a relational database

2015-07-22 Thread Carlos Rolo
Hello Anton,

You need to look into Datastax Entreprise (DSE) Offering. It integrates
Solr search which allows you to do searches like the one you mention. There
are also some opensource projects doing this kind of integration, so its up
to you.

And as Oded mentioned Cassandra really shines on key queries.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
http://linkedin.com/in/carlosjuzarterolo*
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Wed, Jul 22, 2015 at 7:39 AM, Peer, Oded oded.p...@rsa.com wrote:

 Cassandra is a highly scalable, eventually consistent, distributed,
 structured key-value store http://wiki.apache.org/cassandra/
 It is intended for searching by key. It has more querying options but it
 really shines when querying by key.

 Not all databases offer the same functionality. Both a knife and a fork
 are eating utensils, but you wouldn't want to cut a tomato with a fork.
 There are text-indexing databases out there that might suit your needs
 better. Try elasticsearch.

 -Original Message-
 From: anton [mailto:anto...@gmx.de]
 Sent: Tuesday, July 21, 2015 7:54 PM
 To: user@cassandra.apache.org
 Subject: howto do sql query like in a relational database

 Hi,

 I have a simple (perhaps stupid) question.

 If I want to *search* data in cassandra, how could find in a text field
 all records which start with 'Cas'
 ( in sql I do select * from table where field like 'Cas%')

 I know that this is not directly possible.

  - But how is it possible?

  - Do nobody have the need to search text fragments,
and if not is there a small example to explain
*why* this is not needed?

 As far as I understand, databases are great for *searching* data.
 Concerning numerical data in cassandra I can use   = all that operators.

 Is cassandra intended to be used for mostly numerical data?

 I did not catch the point up to now, sorry.

  Anton




-- 


--





Re: Schema questions for data structures with recently-modified access patterns

2015-07-22 Thread Carlos Alonso
Ah, so you your access pattern is to get all documents modified in a
particular date, right?

Then I think your approach is good, and to avoid duplication, why don't add
the docId as the first clustering column and remove the last_modified field
from it?
That way, your primary key would be PRIMARY KEY(date, docId), making all
docs modified in same day be together in the same partition, and on the
other hand, two updates on the same date won't generate a two rows as the
primary key would be exactly the same.

Does it make sense?

Carlos Alonso | Software Engineer | @calonso https://twitter.com/calonso

On 21 July 2015 at 18:37, Robert Wille rwi...@fold3.com wrote:

  The time series doesn’t provide the access pattern I’m looking for. No
 way to query recently-modified documents.

  On Jul 21, 2015, at 9:13 AM, Carlos Alonso i...@mrcalonso.com wrote:

  Hi Robert,

  What about modelling it as a time serie?

  CREATE TABLE document (
   docId UUID,
   doc TEXT,
   last_modified TIMESTAMP
   PRIMARY KEY(docId, last_modified)
 ) WITH CLUSTERING ORDER BY (last_modified DESC);

  This way, you the lastest modification will always be the first record
 in the row, therefore accessing it should be as easy as:

  SELECT * FROM document WHERE docId == the docId LIMIT 1;

  And, if you experience diskspace issues due to very long rows, then you
 can always expire old ones using TTL or on a batch job. Tombstones will
 never be a problem in this case as, due to the specified clustering order,
 the latest modification will always be first record in the row.

  Hope it helps.

  Carlos Alonso | Software Engineer | @calonso
 https://twitter.com/calonso

 On 21 July 2015 at 05:59, Robert Wille rwi...@fold3.com wrote:

 Data structures that have a recently-modified access pattern seem to be a
 poor fit for Cassandra. I’m wondering if any of you smart guys can provide
 suggestions.

 For the sake of discussion, lets assume I have the following tables:

 CREATE TABLE document (
 docId UUID,
 doc TEXT,
 last_modified TIMEUUID,
 PRIMARY KEY ((docid))
 )

 CREATE TABLE doc_by_last_modified (
 date TEXT,
 last_modified TIMEUUID,
 docId UUID,
 PRIMARY KEY ((date), last_modified)
 )

 When I update a document, I retrieve its last_modified time, delete the
 current record from doc_by_last_modified, and add a new one. Unfortunately,
 if you’d like each document to appear at most once in the
 doc_by_last_modified table, then this doesn’t work so well.

 Documents can get into the doc_by_last_modified table multiple times if
 there is concurrent access, or if there is a consistency issue.

 Any thoughts out there on how to efficiently provide recently-modified
 access to a table? This problem exists for many types of data structures,
 not just recently-modified. Any ordered data structure that can be
 dynamically reordered suffers from the same problems. As I’ve been doing
 schema design, this pattern keeps recurring. A nice way to address this
 problem has lots of applications.

 Thanks in advance for your thoughts

 Robert






Cassandra compaction appears to stall, node becomes partially unresponsive

2015-07-22 Thread Bryan Cheng
Hi there,

Within our Cassandra cluster, we're observing, on occasion, one or two
nodes at a time becoming partially unresponsive.

We're running 2.1.7 across the entire cluster.

nodetool still reports the node as being healthy, and it does respond to
some local queries; however, the CPU is pegged at 100%. One common thread
(heh) each time this happens is that there always seems to be one of more
compaction threads running (via nodetool tpstats), and some appear to be
stuck (active count doesn't change, pending count doesn't decrease). A
request for compactionstats hangs with no response.

Each time we've seen this, the only thing that appears to resolve the issue
is a restart of the Cassandra process; the restart does not appear to be
clean, and requires one or more attempts (or a -9 on occasion).

There does not seem to be any pattern to what machines are affected; the
nodes thus far have been different instances on different physical machines
and on different racks.

Has anyone seen this before? Alternatively, when this happens again, what
data can we collect that would help with the debugging process (in addition
to tpstats)?

Thanks in advance,

Bryan


Re: Cassandra compaction appears to stall, node becomes partially unresponsive

2015-07-22 Thread Aiman Parvaiz
Hi Bryan
How's GC behaving on these boxes?

On Wed, Jul 22, 2015 at 2:55 PM, Bryan Cheng br...@blockcypher.com wrote:

 Hi there,

 Within our Cassandra cluster, we're observing, on occasion, one or two
 nodes at a time becoming partially unresponsive.

 We're running 2.1.7 across the entire cluster.

 nodetool still reports the node as being healthy, and it does respond to
 some local queries; however, the CPU is pegged at 100%. One common thread
 (heh) each time this happens is that there always seems to be one of more
 compaction threads running (via nodetool tpstats), and some appear to be
 stuck (active count doesn't change, pending count doesn't decrease). A
 request for compactionstats hangs with no response.

 Each time we've seen this, the only thing that appears to resolve the
 issue is a restart of the Cassandra process; the restart does not appear to
 be clean, and requires one or more attempts (or a -9 on occasion).

 There does not seem to be any pattern to what machines are affected; the
 nodes thus far have been different instances on different physical machines
 and on different racks.

 Has anyone seen this before? Alternatively, when this happens again, what
 data can we collect that would help with the debugging process (in addition
 to tpstats)?

 Thanks in advance,

 Bryan




-- 
*Aiman Parvaiz*
Lead Systems Architect
ai...@flipagram.com
cell: 213-300-6377
http://flipagram.com/apz


Re: Upgraded to Cassandra 2.2.0 nodes not seeing each other

2015-07-22 Thread Carlos Scheidecker
I agreed Michael. I was generating stuff for it again, Looks like they had
the SSL stack changed. I came from 2.1.6 to 2.2.0. Thanks.

On Wed, Jul 22, 2015 at 5:45 PM, Michael Shuler mich...@pbandjelly.org
wrote:

 What version of Cassandra did you upgrade to 2.2.0 *from*?

 This would help with looking at config differences, changelogs, etc.

 It seems you have some pretty clear SSL connection errors, according to
 the logs, which at least helps with seeing why the nodes can't talk to each
 other. I'm not terribly familiar with using SSL with Cassandra, but it
 seems clear that you have an incorrect server_encryption_options:
 cipher_suites: configuration.

 --
 Kind regards,
 Michael

 On 07/22/2015 06:33 PM, Carlos Scheidecker wrote:

 Thanks for the reply, Michael!

 Yes, I did followed the upgrade nodes.

 I am running Ubuntu Ubuntu 14.04.2 LTS on all and
 kernel 3.13.0-57-generic on all.

 I have 4 machines: .31, .32, .33 and .34. If I run nodetool status from
 .34 I now see all the others as DN the same happens if I log in from the
 others:

 DN  192.168.1.31  ?  256  ?
 1f8000f5-026c-42c7-8189-cf19fbede566  RAC1
 DN  192.168.1.32  ?  256  ?
 12478d45-3d5e-418b-a0dc-dba6d4307af3  RAC1
 DN  192.168.1.33  ?  256  ?
 994172b3-cd36-4558-a4b8-054cfac027f3  RAC1
 UN  192.168.1.34  1.7 MB 256  ?
 b66be1f3-bb4a-49bd-9835-5c8ee2a71e5c  RAC1

 If I do a netstat -atn from .34 I get:

 tcp0  0 127.0.1.1:53 http://127.0.1.1:53
   0.0.0.0:*   LISTEN
 tcp0  0 0.0.0.0:22 http://0.0.0.0:22
   0.0.0.0:*   LISTEN
 tcp0  0 127.0.0.1:631 http://127.0.0.1:631
 0.0.0.0:*   LISTEN
 tcp0  0 192.168.1.34:7001 http://192.168.1.34:7001
 0.0.0.0:*   LISTEN
 tcp0  0 127.0.0.1:7199 http://127.0.0.1:7199
   0.0.0.0:*   LISTEN
 tcp0  0 192.168.1.34:9160 http://192.168.1.34:9160
 0.0.0.0:*   LISTEN
 tcp0  0 127.0.0.1:59441 http://127.0.0.1:59441
 0.0.0.0:*   LISTEN
 tcp0  0 192.168.1.34:52951 http://192.168.1.34:52951
 192.168.1.31:7001 http://192.168.1.31:7001   ESTABLISHED

 On the logs I now have the following errors (/var/log/syslog.log):

 WARN  [MessagingService-Outgoing-/192.168.1.31 http://192.168.1.31]
 2015-07-22 17:29:48,764 SSLFactory.java:163 - Filtering out

 TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
 as it isnt supported by the socket
 ERROR [MessagingService-Outgoing-/192.168.1.31 http://192.168.1.31]
 2015-07-22 17:29:48,764 OutboundTcpConnection.java:229 - error
 processing a message intended for /192.168.1.31 http://192.168.1.31
 java.lang.NullPointerException: null
 at
 com.google.common.base.Preconditions.checkNotNull(Preconditions.java:213)
 ~[guava-16.0.jar:na]
 at

 org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.init(BufferedDataOutputStreamPlus.java:74)
 ~[apache-cassandra-2.2.0.jar:2.2.0]
 at
 org.apache.cassandra.net
 .OutboundTcpConnection.connect(OutboundTcpConnection.java:404)
 ~[apache-cassandra-2.2.0.jar:2.2.0]
 at
 org.apache.cassandra.net
 .OutboundTcpConnection.run(OutboundTcpConnection.java:218)
 ~[apache-cassandra-2.2.0.jar:2.2.0]
 ERROR [MessagingService-Outgoing-/192.168.1.31 http://192.168.1.31]
 2015-07-22 17:29:48,764 OutboundTcpConnection.java:316 - error writing
 to /192.168.1.31 http://192.168.1.31
 java.lang.NullPointerException: null
 at
 org.apache.cassandra.net
 .OutboundTcpConnection.writeInternal(OutboundTcpConnection.java:323)
 [apache-cassandra-2.2.0.jar:2.2.0]
 at
 org.apache.cassandra.net
 .OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:285)
 [apache-cassandra-2.2.0.jar:2.2.0]
 at
 org.apache.cassandra.net
 .OutboundTcpConnection.run(OutboundTcpConnection.java:219)
 [apache-cassandra-2.2.0.jar:2.2.0]
 WARN  [MessagingService-Outgoing-/192.168.1.33 http://192.168.1.33]
 2015-07-22 17:29:49,764 SSLFactory.java:163 - Filtering out

 TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
 as it isnt supported by the socket
 WARN  [MessagingService-Outgoing-/192.168.1.31 http://192.168.1.31]
 2015-07-22 17:29:49,764 SSLFactory.java:163 - Filtering out

 TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
 as it isnt supported by the socket
 ERROR [MessagingService-Outgoing-/192.168.1.33 http://192.168.1.33]
 2015-07-22 17:29:49,764 OutboundTcpConnection.java:229 - error
 processing a message intended for /192.168.1.33 http://192.168.1.33
 java.lang.NullPointerException: null
 at
 com.google.common.base.Preconditions.checkNotNull(Preconditions.java:213)
 ~[guava-16.0.jar:na]
 at

 org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.init(BufferedDataOutputStreamPlus.java:74)
 ~[apache-cassandra-2.2.0.jar:2.2.0]
 at
 org.apache.cassandra.net
 

Re: Cassandra compaction appears to stall, node becomes partially unresponsive

2015-07-22 Thread Bryan Cheng
Robert, thanks for these references! We're not using DTCS, so 9056 and 8243
seem out, but I'll take a look at 9577 (also looked at the referenced
thread on this list, which seems to have some interesting data)

On Wed, Jul 22, 2015 at 5:33 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Jul 22, 2015 at 2:55 PM, Bryan Cheng br...@blockcypher.com
 wrote:

 nodetool still reports the node as being healthy, and it does respond to
 some local queries; however, the CPU is pegged at 100%. One common thread
 (heh) each time this happens is that there always seems to be one of more
 compaction threads running (via nodetool tpstats), and some appear to be
 stuck (active count doesn't change, pending count doesn't decrease). A
 request for compactionstats hangs with no response.


 I've heard other reports of compaction appearing to stall in 2.1.7...
 wondering if you're affected by any of these...

 https://issues.apache.org/jira/browse/CASSANDRA-9577
 or
 https://issues.apache.org/jira/browse/CASSANDRA-9056 or
 https://issues.apache.org/jira/browse/CASSANDRA-8243 (these should not be
 in 2.1.7)

 =Rob




Issues with SSL encrption after updating to 2.2.0 from 2.1.6

2015-07-22 Thread Carlos Scheidecker
Hello all,


After updating to Cassandra 2.2.0 from 2.1.6 I am having SSL issues:

My JVM is java version 1.8.0_45
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)


Ubuntu 14.04.2 LTS is on all nodes, they are the same.

Below is the encryption settings from cassandra.yaml of all nodes.

I am using the same keystore and trustore as I had used before on 2.1.6


# Enable or disable inter-node encryption
# Default settings are TLS v1, RSA 1024-bit keys (it is imperative that
# users generate their own keys) TLS_RSA_WITH_AES_128_CBC_SHA as the cipher
# suite for authentication, key exchange and encryption of the actual data
transfers.
# Use the DHE/ECDHE ciphers if running in FIPS 140 compliant mode.
# NOTE: No custom encryption options are enabled at the moment
# The available internode options are : all, none, dc, rack
#
# If set to dc cassandra will encrypt the traffic between the DCs
# If set to rack cassandra will encrypt the traffic between the racks
#
# The passwords used in these options must match the passwords used when
generating
# the keystore and truststore.  For instructions on generating these files,
see:
#
http://download.oracle.com/javase/6/docs/technotes/guides/security/jsse/JSSERefGuide.html#CreateKeystore
#
server_encryption_options:
internode_encryption: all
keystore: /etc/cassandra/certs/node.keystore
keystore_password: mypasswd
truststore: /etc/cassandra/certs/global.truststore
truststore_password: mypasswd
# More advanced defaults below:
# protocol: TLS
# algorithm: SunX509
# store_type: JKS
cipher_suites:
[TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
require_client_auth: false

# enable or disable client/server encryption.


Nodes cannot talk to each other as per SSL errors bellow.

WARN  [MessagingService-Outgoing-/192.168.1.31] 2015-07-22 17:29:48,764
SSLFactory.java:163 - Filtering out
TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
as it isnt supported by the socket
ERROR [MessagingService-Outgoing-/192.168.1.31] 2015-07-22 17:29:48,764
OutboundTcpConnection.java:229 - error processing a message intended for /
192.168.1.31
java.lang.NullPointerException: null
at
com.google.common.base.Preconditions.checkNotNull(Preconditions.java:213)
~[guava-16.0.jar:na]
at
org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.init(BufferedDataOutputStreamPlus.java:74)
~[apache-cassandra-2.2.0.jar:2.2.0]
at
org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:404)
~[apache-cassandra-2.2.0.jar:2.2.0]
at
org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:218)
~[apache-cassandra-2.2.0.jar:2.2.0]
ERROR [MessagingService-Outgoing-/192.168.1.31] 2015-07-22 17:29:48,764
OutboundTcpConnection.java:316 - error writing to /192.168.1.31
java.lang.NullPointerException: null
at
org.apache.cassandra.net.OutboundTcpConnection.writeInternal(OutboundTcpConnection.java:323)
[apache-cassandra-2.2.0.jar:2.2.0]
at
org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:285)
[apache-cassandra-2.2.0.jar:2.2.0]
at
org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:219)
[apache-cassandra-2.2.0.jar:2.2.0]
WARN  [MessagingService-Outgoing-/192.168.1.33] 2015-07-22 17:29:49,764
SSLFactory.java:163 - Filtering out
TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
as it isnt supported by the socket
WARN  [MessagingService-Outgoing-/192.168.1.31] 2015-07-22 17:29:49,764
SSLFactory.java:163 - Filtering out
TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
as it isnt supported by the socket
ERROR [MessagingService-Outgoing-/192.168.1.33] 2015-07-22 17:29:49,764
OutboundTcpConnection.java:229 - error processing a message intended for /
192.168.1.33
java.lang.NullPointerException: null
at
com.google.common.base.Preconditions.checkNotNull(Preconditions.java:213)
~[guava-16.0.jar:na]
at
org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.init(BufferedDataOutputStreamPlus.java:74)
~[apache-cassandra-2.2.0.jar:2.2.0]
at
org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:404)
~[apache-cassandra-2.2.0.jar:2.2.0]
at
org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:218)
~[apache-cassandra-2.2.0.jar:2.2.0]
ERROR [MessagingService-Outgoing-/192.168.1.31] 2015-07-22 17:29:49,764
OutboundTcpConnection.java:229 - error processing a message intended for /
192.168.1.31
java.lang.NullPointerException: null
at
com.google.common.base.Preconditions.checkNotNull(Preconditions.java:213)
~[guava-16.0.jar:na]
at

Re: Cassandra - Spark - Flume: best architecture for log analytics.

2015-07-22 Thread Pierre Devops
Cassandra is not very good at massive read/bulk read if you need to
retrieve and compute a large amount of data on multiple machines using
something like spark or hadoop (or you'll need to hack and process the
sstable directly, something which is not natively supported, you'll have
to hack your way)

However, it's very good to store and retrieve them once they have been
processed and sorted. That's why I would opt for solution 2) or for another
solution which process data before inserting them in cassandra, and doesn't
use cassandra as a temporary store.

2015-07-23 2:04 GMT+02:00 Renato Perini renato.per...@gmail.com:

 Problem: Log analytics.

 Solutions:
1) Aggregating logs using Flume and storing the aggregations into
 Cassandra. Spark reads data from Cassandra, make some computations
 and write the results in distinct tables, still in Cassandra.
2) Aggregating logs using Flume to a sink, streaming data directly
 into Spark. Spark make some computations and store the results in Cassandra.
3) *** your solution ***

 Which is the best workflow for this task?
 I would like to setup something flexible enough to allow me to use batch
 processing and realtime streaming without major fuss.

 Thank you in advance.






Re: Schema questions for data structures with recently-modified access patterns

2015-07-22 Thread Jack Krupansky
No way to query recently-modified documents.

I don't follow why you say that. I mean, that was the point of the data
model suggestion I proposed. Maybe you could clarify.

I also wanted to mention that the new materialized view feature of
Cassandra 3.0 might handle this use case, including taking care of the
delete, automatically.


-- Jack Krupansky

On Tue, Jul 21, 2015 at 12:37 PM, Robert Wille rwi...@fold3.com wrote:

  The time series doesn’t provide the access pattern I’m looking for. No
 way to query recently-modified documents.

  On Jul 21, 2015, at 9:13 AM, Carlos Alonso i...@mrcalonso.com wrote:

  Hi Robert,

  What about modelling it as a time serie?

  CREATE TABLE document (
   docId UUID,
   doc TEXT,
   last_modified TIMESTAMP
   PRIMARY KEY(docId, last_modified)
 ) WITH CLUSTERING ORDER BY (last_modified DESC);

  This way, you the lastest modification will always be the first record
 in the row, therefore accessing it should be as easy as:

  SELECT * FROM document WHERE docId == the docId LIMIT 1;

  And, if you experience diskspace issues due to very long rows, then you
 can always expire old ones using TTL or on a batch job. Tombstones will
 never be a problem in this case as, due to the specified clustering order,
 the latest modification will always be first record in the row.

  Hope it helps.

  Carlos Alonso | Software Engineer | @calonso
 https://twitter.com/calonso

 On 21 July 2015 at 05:59, Robert Wille rwi...@fold3.com wrote:

 Data structures that have a recently-modified access pattern seem to be a
 poor fit for Cassandra. I’m wondering if any of you smart guys can provide
 suggestions.

 For the sake of discussion, lets assume I have the following tables:

 CREATE TABLE document (
 docId UUID,
 doc TEXT,
 last_modified TIMEUUID,
 PRIMARY KEY ((docid))
 )

 CREATE TABLE doc_by_last_modified (
 date TEXT,
 last_modified TIMEUUID,
 docId UUID,
 PRIMARY KEY ((date), last_modified)
 )

 When I update a document, I retrieve its last_modified time, delete the
 current record from doc_by_last_modified, and add a new one. Unfortunately,
 if you’d like each document to appear at most once in the
 doc_by_last_modified table, then this doesn’t work so well.

 Documents can get into the doc_by_last_modified table multiple times if
 there is concurrent access, or if there is a consistency issue.

 Any thoughts out there on how to efficiently provide recently-modified
 access to a table? This problem exists for many types of data structures,
 not just recently-modified. Any ordered data structure that can be
 dynamically reordered suffers from the same problems. As I’ve been doing
 schema design, this pattern keeps recurring. A nice way to address this
 problem has lots of applications.

 Thanks in advance for your thoughts

 Robert






Upgraded to Cassandra 2.2.0 nodes not seeing each other

2015-07-22 Thread Carlos Scheidecker
All,

I have a 4 node Cassandra system running on 4 Ubuntu boxes. After updating
to Cassandra 2.2.0 and keeping the same cassandra.yaml file, the nodes
cannot see each other.

When I do a nodetool status it only reports as being up the machine where I
had issue the command.

In other words, all the machines cannot communicate to each other any
longer. Nodetool status behave the same on each machine.

I am trying to debug that, hopefully only something on the configuration
that has changed.

Any ideas?

Thanks.

C.


Re: Cassandra compaction appears to stall, node becomes partially unresponsive

2015-07-22 Thread Bryan Cheng
Hi Aiman,

We previously had issues with GC, but since upgrading to 2.1.7 things seem
a lot healthier.

We collect GC statistics through collectd via the garbage collector mbean,
ParNew GC's report sub 500ms collection time on average (I believe
accumulated per minute?) and CMS peaks at about 300ms collection time when
it runs.

On Wed, Jul 22, 2015 at 3:22 PM, Aiman Parvaiz ai...@flipagram.com wrote:

 Hi Bryan
 How's GC behaving on these boxes?

 On Wed, Jul 22, 2015 at 2:55 PM, Bryan Cheng br...@blockcypher.com
 wrote:

 Hi there,

 Within our Cassandra cluster, we're observing, on occasion, one or two
 nodes at a time becoming partially unresponsive.

 We're running 2.1.7 across the entire cluster.

 nodetool still reports the node as being healthy, and it does respond to
 some local queries; however, the CPU is pegged at 100%. One common thread
 (heh) each time this happens is that there always seems to be one of more
 compaction threads running (via nodetool tpstats), and some appear to be
 stuck (active count doesn't change, pending count doesn't decrease). A
 request for compactionstats hangs with no response.

 Each time we've seen this, the only thing that appears to resolve the
 issue is a restart of the Cassandra process; the restart does not appear to
 be clean, and requires one or more attempts (or a -9 on occasion).

 There does not seem to be any pattern to what machines are affected; the
 nodes thus far have been different instances on different physical machines
 and on different racks.

 Has anyone seen this before? Alternatively, when this happens again, what
 data can we collect that would help with the debugging process (in addition
 to tpstats)?

 Thanks in advance,

 Bryan




 --
 *Aiman Parvaiz*
 Lead Systems Architect
 ai...@flipagram.com
 cell: 213-300-6377
 http://flipagram.com/apz



Re: Cassandra compaction appears to stall, node becomes partially unresponsive

2015-07-22 Thread Bryan Cheng
Aiman,

Your post made me look back at our data a bit. The most recent occurrence
of this incident was not preceded by any abnormal GC activity; however, the
previous occurrence (which took place a few days ago) did correspond to a
massive, order-of-magnitude increase in both ParNew and CMS collection
times which lasted ~17 hours.

Was there something in particular that links GC to these stalls? At this
point in time, we cannot identify any particular reason for either that GC
spike or the subsequent apparent compaction stall, although it did not seem
to have any effect on our usage of the cluster.

On Wed, Jul 22, 2015 at 3:35 PM, Bryan Cheng br...@blockcypher.com wrote:

 Hi Aiman,

 We previously had issues with GC, but since upgrading to 2.1.7 things seem
 a lot healthier.

 We collect GC statistics through collectd via the garbage collector mbean,
 ParNew GC's report sub 500ms collection time on average (I believe
 accumulated per minute?) and CMS peaks at about 300ms collection time when
 it runs.

 On Wed, Jul 22, 2015 at 3:22 PM, Aiman Parvaiz ai...@flipagram.com
 wrote:

 Hi Bryan
 How's GC behaving on these boxes?

 On Wed, Jul 22, 2015 at 2:55 PM, Bryan Cheng br...@blockcypher.com
 wrote:

 Hi there,

 Within our Cassandra cluster, we're observing, on occasion, one or two
 nodes at a time becoming partially unresponsive.

 We're running 2.1.7 across the entire cluster.

 nodetool still reports the node as being healthy, and it does respond to
 some local queries; however, the CPU is pegged at 100%. One common thread
 (heh) each time this happens is that there always seems to be one of more
 compaction threads running (via nodetool tpstats), and some appear to be
 stuck (active count doesn't change, pending count doesn't decrease). A
 request for compactionstats hangs with no response.

 Each time we've seen this, the only thing that appears to resolve the
 issue is a restart of the Cassandra process; the restart does not appear to
 be clean, and requires one or more attempts (or a -9 on occasion).

 There does not seem to be any pattern to what machines are affected; the
 nodes thus far have been different instances on different physical machines
 and on different racks.

 Has anyone seen this before? Alternatively, when this happens again,
 what data can we collect that would help with the debugging process (in
 addition to tpstats)?

 Thanks in advance,

 Bryan




 --
 *Aiman Parvaiz*
 Lead Systems Architect
 ai...@flipagram.com
 cell: 213-300-6377
 http://flipagram.com/apz





Re: Upgraded to Cassandra 2.2.0 nodes not seeing each other

2015-07-22 Thread Michael Shuler

What version of Cassandra did you upgrade to 2.2.0 *from*?

This would help with looking at config differences, changelogs, etc.

It seems you have some pretty clear SSL connection errors, according to 
the logs, which at least helps with seeing why the nodes can't talk to 
each other. I'm not terribly familiar with using SSL with Cassandra, but 
it seems clear that you have an incorrect server_encryption_options: 
cipher_suites: configuration.


--
Kind regards,
Michael

On 07/22/2015 06:33 PM, Carlos Scheidecker wrote:

Thanks for the reply, Michael!

Yes, I did followed the upgrade nodes.

I am running Ubuntu Ubuntu 14.04.2 LTS on all and
kernel 3.13.0-57-generic on all.

I have 4 machines: .31, .32, .33 and .34. If I run nodetool status from
.34 I now see all the others as DN the same happens if I log in from the
others:

DN  192.168.1.31  ?  256  ?
1f8000f5-026c-42c7-8189-cf19fbede566  RAC1
DN  192.168.1.32  ?  256  ?
12478d45-3d5e-418b-a0dc-dba6d4307af3  RAC1
DN  192.168.1.33  ?  256  ?
994172b3-cd36-4558-a4b8-054cfac027f3  RAC1
UN  192.168.1.34  1.7 MB 256  ?
b66be1f3-bb4a-49bd-9835-5c8ee2a71e5c  RAC1

If I do a netstat -atn from .34 I get:

tcp0  0 127.0.1.1:53 http://127.0.1.1:53
  0.0.0.0:*   LISTEN
tcp0  0 0.0.0.0:22 http://0.0.0.0:22
  0.0.0.0:*   LISTEN
tcp0  0 127.0.0.1:631 http://127.0.0.1:631
0.0.0.0:*   LISTEN
tcp0  0 192.168.1.34:7001 http://192.168.1.34:7001
0.0.0.0:*   LISTEN
tcp0  0 127.0.0.1:7199 http://127.0.0.1:7199
  0.0.0.0:*   LISTEN
tcp0  0 192.168.1.34:9160 http://192.168.1.34:9160
0.0.0.0:*   LISTEN
tcp0  0 127.0.0.1:59441 http://127.0.0.1:59441
0.0.0.0:*   LISTEN
tcp0  0 192.168.1.34:52951 http://192.168.1.34:52951
192.168.1.31:7001 http://192.168.1.31:7001   ESTABLISHED

On the logs I now have the following errors (/var/log/syslog.log):

WARN  [MessagingService-Outgoing-/192.168.1.31 http://192.168.1.31]
2015-07-22 17:29:48,764 SSLFactory.java:163 - Filtering out
TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
as it isnt supported by the socket
ERROR [MessagingService-Outgoing-/192.168.1.31 http://192.168.1.31]
2015-07-22 17:29:48,764 OutboundTcpConnection.java:229 - error
processing a message intended for /192.168.1.31 http://192.168.1.31
java.lang.NullPointerException: null
at
com.google.common.base.Preconditions.checkNotNull(Preconditions.java:213) 
~[guava-16.0.jar:na]
at
org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.init(BufferedDataOutputStreamPlus.java:74)
~[apache-cassandra-2.2.0.jar:2.2.0]
at
org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:404)
~[apache-cassandra-2.2.0.jar:2.2.0]
at
org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:218)
~[apache-cassandra-2.2.0.jar:2.2.0]
ERROR [MessagingService-Outgoing-/192.168.1.31 http://192.168.1.31]
2015-07-22 17:29:48,764 OutboundTcpConnection.java:316 - error writing
to /192.168.1.31 http://192.168.1.31
java.lang.NullPointerException: null
at
org.apache.cassandra.net.OutboundTcpConnection.writeInternal(OutboundTcpConnection.java:323)
[apache-cassandra-2.2.0.jar:2.2.0]
at
org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:285)
[apache-cassandra-2.2.0.jar:2.2.0]
at
org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:219)
[apache-cassandra-2.2.0.jar:2.2.0]
WARN  [MessagingService-Outgoing-/192.168.1.33 http://192.168.1.33]
2015-07-22 17:29:49,764 SSLFactory.java:163 - Filtering out
TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
as it isnt supported by the socket
WARN  [MessagingService-Outgoing-/192.168.1.31 http://192.168.1.31]
2015-07-22 17:29:49,764 SSLFactory.java:163 - Filtering out
TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
as it isnt supported by the socket
ERROR [MessagingService-Outgoing-/192.168.1.33 http://192.168.1.33]
2015-07-22 17:29:49,764 OutboundTcpConnection.java:229 - error
processing a message intended for /192.168.1.33 http://192.168.1.33
java.lang.NullPointerException: null
at
com.google.common.base.Preconditions.checkNotNull(Preconditions.java:213) 
~[guava-16.0.jar:na]
at
org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.init(BufferedDataOutputStreamPlus.java:74)
~[apache-cassandra-2.2.0.jar:2.2.0]
at
org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:404)
~[apache-cassandra-2.2.0.jar:2.2.0]
at
org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:218)
~[apache-cassandra-2.2.0.jar:2.2.0]
ERROR [MessagingService-Outgoing-/192.168.1.31 http://192.168.1.31]
2015-07-22 17:29:49,764 OutboundTcpConnection.java:229 - error

Re: Upgraded to Cassandra 2.2.0 nodes not seeing each other

2015-07-22 Thread Carlos Scheidecker
Thanks for the reply, Michael!

Yes, I did followed the upgrade nodes.

I am running Ubuntu Ubuntu 14.04.2 LTS on all and kernel 3.13.0-57-generic
on all.

I have 4 machines: .31, .32, .33 and .34. If I run nodetool status from .34
I now see all the others as DN the same happens if I log in from the others:

DN  192.168.1.31  ?  256  ?
1f8000f5-026c-42c7-8189-cf19fbede566  RAC1
DN  192.168.1.32  ?  256  ?
12478d45-3d5e-418b-a0dc-dba6d4307af3  RAC1
DN  192.168.1.33  ?  256  ?
994172b3-cd36-4558-a4b8-054cfac027f3  RAC1
UN  192.168.1.34  1.7 MB 256  ?
b66be1f3-bb4a-49bd-9835-5c8ee2a71e5c  RAC1

If I do a netstat -atn from .34 I get:

tcp0  0 127.0.1.1:530.0.0.0:*   LISTEN

tcp0  0 0.0.0.0:22  0.0.0.0:*   LISTEN

tcp0  0 127.0.0.1:631   0.0.0.0:*   LISTEN

tcp0  0 192.168.1.34:7001   0.0.0.0:*   LISTEN

tcp0  0 127.0.0.1:7199  0.0.0.0:*   LISTEN

tcp0  0 192.168.1.34:9160   0.0.0.0:*   LISTEN

tcp0  0 127.0.0.1:59441 0.0.0.0:*   LISTEN

tcp0  0 192.168.1.34:52951  192.168.1.31:7001
ESTABLISHED

On the logs I now have the following errors (/var/log/syslog.log):

WARN  [MessagingService-Outgoing-/192.168.1.31] 2015-07-22 17:29:48,764
SSLFactory.java:163 - Filtering out
TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
as it isnt supported by the socket
ERROR [MessagingService-Outgoing-/192.168.1.31] 2015-07-22 17:29:48,764
OutboundTcpConnection.java:229 - error processing a message intended for /
192.168.1.31
java.lang.NullPointerException: null
at
com.google.common.base.Preconditions.checkNotNull(Preconditions.java:213)
~[guava-16.0.jar:na]
at
org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.init(BufferedDataOutputStreamPlus.java:74)
~[apache-cassandra-2.2.0.jar:2.2.0]
at
org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:404)
~[apache-cassandra-2.2.0.jar:2.2.0]
at
org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:218)
~[apache-cassandra-2.2.0.jar:2.2.0]
ERROR [MessagingService-Outgoing-/192.168.1.31] 2015-07-22 17:29:48,764
OutboundTcpConnection.java:316 - error writing to /192.168.1.31
java.lang.NullPointerException: null
at
org.apache.cassandra.net.OutboundTcpConnection.writeInternal(OutboundTcpConnection.java:323)
[apache-cassandra-2.2.0.jar:2.2.0]
at
org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:285)
[apache-cassandra-2.2.0.jar:2.2.0]
at
org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:219)
[apache-cassandra-2.2.0.jar:2.2.0]
WARN  [MessagingService-Outgoing-/192.168.1.33] 2015-07-22 17:29:49,764
SSLFactory.java:163 - Filtering out
TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
as it isnt supported by the socket
WARN  [MessagingService-Outgoing-/192.168.1.31] 2015-07-22 17:29:49,764
SSLFactory.java:163 - Filtering out
TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
as it isnt supported by the socket
ERROR [MessagingService-Outgoing-/192.168.1.33] 2015-07-22 17:29:49,764
OutboundTcpConnection.java:229 - error processing a message intended for /
192.168.1.33
java.lang.NullPointerException: null
at
com.google.common.base.Preconditions.checkNotNull(Preconditions.java:213)
~[guava-16.0.jar:na]
at
org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.init(BufferedDataOutputStreamPlus.java:74)
~[apache-cassandra-2.2.0.jar:2.2.0]
at
org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:404)
~[apache-cassandra-2.2.0.jar:2.2.0]
at
org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:218)
~[apache-cassandra-2.2.0.jar:2.2.0]
ERROR [MessagingService-Outgoing-/192.168.1.31] 2015-07-22 17:29:49,764
OutboundTcpConnection.java:229 - error processing a message intended for /
192.168.1.31
java.lang.NullPointerException: null
at
com.google.common.base.Preconditions.checkNotNull(Preconditions.java:213)
~[guava-16.0.jar:na]
at
org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.init(BufferedDataOutputStreamPlus.java:74)
~[apache-cassandra-2.2.0.jar:2.2.0]
at
org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:404)
~[apache-cassandra-2.2.0.jar:2.2.0]
at
org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:218)
~[apache-cassandra-2.2.0.jar:2.2.0]
ERROR [MessagingService-Outgoing-/192.168.1.31] 2015-07-22 17:29:50,763
OutboundTcpConnection.java:316 - error writing to /192.168.1.31
java.lang.NullPointerException: null
at
org.apache.cassandra.net.OutboundTcpConnection.writeInternal(OutboundTcpConnection.java:323)

RE: Schema questions for data structures with recently-modified access patterns

2015-07-22 Thread Alec Collier
I believe what he really wants is to be able to search for the x most recently 
modified documents, i.e. without specifying the docID.

I don’t believe there is a ‘nice’ way of doing this in Cassandra by itself, 
given it really favours key-value storage. Even having the date as the 
partition key is usually not recommended because it means all writes on a given 
date will be hitting one node.

Perhaps Solr integration is the way to go for this access pattern?

Alec Collier

From: Jack Krupansky [mailto:jack.krupan...@gmail.com]
Sent: Thursday, 23 July 2015 8:20 AM
To: user@cassandra.apache.org
Subject: Re: Schema questions for data structures with recently-modified access 
patterns

No way to query recently-modified documents.

I don't follow why you say that. I mean, that was the point of the data model 
suggestion I proposed. Maybe you could clarify.

I also wanted to mention that the new materialized view feature of Cassandra 
3.0 might handle this use case, including taking care of the delete, 
automatically.


-- Jack Krupansky

On Tue, Jul 21, 2015 at 12:37 PM, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
The time series doesn’t provide the access pattern I’m looking for. No way to 
query recently-modified documents.

On Jul 21, 2015, at 9:13 AM, Carlos Alonso 
i...@mrcalonso.commailto:i...@mrcalonso.com wrote:


Hi Robert,

What about modelling it as a time serie?

CREATE TABLE document (
  docId UUID,
  doc TEXT,
  last_modified TIMESTAMP
  PRIMARY KEY(docId, last_modified)
) WITH CLUSTERING ORDER BY (last_modified DESC);

This way, you the lastest modification will always be the first record in the 
row, therefore accessing it should be as easy as:

SELECT * FROM document WHERE docId == the docId LIMIT 1;

And, if you experience diskspace issues due to very long rows, then you can 
always expire old ones using TTL or on a batch job. Tombstones will never be a 
problem in this case as, due to the specified clustering order, the latest 
modification will always be first record in the row.

Hope it helps.

Carlos Alonso | Software Engineer | @calonsohttps://twitter.com/calonso

On 21 July 2015 at 05:59, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
Data structures that have a recently-modified access pattern seem to be a poor 
fit for Cassandra. I’m wondering if any of you smart guys can provide 
suggestions.

For the sake of discussion, lets assume I have the following tables:

CREATE TABLE document (
docId UUID,
doc TEXT,
last_modified TIMEUUID,
PRIMARY KEY ((docid))
)

CREATE TABLE doc_by_last_modified (
date TEXT,
last_modified TIMEUUID,
docId UUID,
PRIMARY KEY ((date), last_modified)
)

When I update a document, I retrieve its last_modified time, delete the current 
record from doc_by_last_modified, and add a new one. Unfortunately, if you’d 
like each document to appear at most once in the doc_by_last_modified table, 
then this doesn’t work so well.

Documents can get into the doc_by_last_modified table multiple times if there 
is concurrent access, or if there is a consistency issue.

Any thoughts out there on how to efficiently provide recently-modified access 
to a table? This problem exists for many types of data structures, not just 
recently-modified. Any ordered data structure that can be dynamically reordered 
suffers from the same problems. As I’ve been doing schema design, this pattern 
keeps recurring. A nice way to address this problem has lots of applications.

Thanks in advance for your thoughts

Robert




This email, including any attachments, is confidential. If you are not the 
intended recipient, you must not disclose, distribute or use the information in 
this email in any way. If you received this email in error, please notify the 
sender immediately by return email and delete the message. Unless expressly 
stated otherwise, the information in this email should not be regarded as an 
offer to sell or as a solicitation of an offer to buy any financial product or 
service, an official confirmation of any transaction, or as an official 
statement of the entity sending this message. Neither Macquarie Group Limited, 
nor any of its subsidiaries, guarantee the integrity of any emails or attached 
files and are not responsible for any changes made to them by any other person.


Upgraded to Cassandra 2.2.0 nodes not seeing each other

2015-07-22 Thread Carlos Scheidecker
All,

I have a 4 node Cassandra system running on 4 Ubuntu boxes. After updating
to Cassandra 2.2.0 and keeping the same cassandra.yaml file, the nodes
cannot see each other.

When I do a nodetool status it only reports as being up the machine where I
had issue the command.

In other words, all the machines cannot communicate to each other any
longer. Nodetool status behave the same on each machine.

I am trying to debug that, hopefully only something on the configuration
that has changed.

Any ideas?

Thanks.

C.


Re: Upgraded to Cassandra 2.2.0 nodes not seeing each other

2015-07-22 Thread Michael Shuler

On 07/22/2015 04:45 PM, Carlos Scheidecker wrote:

I have a 4 node Cassandra system running on 4 Ubuntu boxes. After
updating to Cassandra 2.2.0 and keeping the same cassandra.yaml file,
the nodes cannot see each other.


What version did you upgrade from?

Usually, when upgrading, it is probably a good idea to start with the 
default cassandra.yaml from the new version (2.2.0 in your case) and 
edit the necessary items from your old version; i.e. num_tokens, 
initial_token, listen_address, broadcast_address, etc. You are perhaps 
missing some sort of default setting that 2.2.0 is looking for?



When I do a nodetool status it only reports as being up the machine
where I had issue the command.

In other words, all the machines cannot communicate to each other any
longer. Nodetool status behave the same on each machine.

I am trying to debug that, hopefully only something on the configuration
that has changed.

Any ideas?


Anything helpful in the system.log on each of your nodes?

Did you follow all the upgrade notes from your previous release to 2.2.0?

https://github.com/apache/cassandra/blob/cassandra-2.2.0/NEWS.txt

--
Kind regards,
Michael


Re: Cassandra compaction appears to stall, node becomes partially unresponsive

2015-07-22 Thread Aiman Parvaiz
I faced something similar in past and the reason for nodes becoming 
unresponsive intermittently was Long GC pauses. That's why I wanted to bring 
this to your attention incase GC pause is a potential cause.

Sent from my iPhone

 On Jul 22, 2015, at 4:32 PM, Bryan Cheng br...@blockcypher.com wrote:
 
 Aiman,
 
 Your post made me look back at our data a bit. The most recent occurrence of 
 this incident was not preceded by any abnormal GC activity; however, the 
 previous occurrence (which took place a few days ago) did correspond to a 
 massive, order-of-magnitude increase in both ParNew and CMS collection times 
 which lasted ~17 hours.
 
 Was there something in particular that links GC to these stalls? At this 
 point in time, we cannot identify any particular reason for either that GC 
 spike or the subsequent apparent compaction stall, although it did not seem 
 to have any effect on our usage of the cluster.
 
 On Wed, Jul 22, 2015 at 3:35 PM, Bryan Cheng br...@blockcypher.com wrote:
 Hi Aiman,
 
 We previously had issues with GC, but since upgrading to 2.1.7 things seem a 
 lot healthier.
 
 We collect GC statistics through collectd via the garbage collector mbean, 
 ParNew GC's report sub 500ms collection time on average (I believe 
 accumulated per minute?) and CMS peaks at about 300ms collection time when 
 it runs.
 
 On Wed, Jul 22, 2015 at 3:22 PM, Aiman Parvaiz ai...@flipagram.com wrote:
 Hi Bryan
 How's GC behaving on these boxes?
 
 On Wed, Jul 22, 2015 at 2:55 PM, Bryan Cheng br...@blockcypher.com wrote:
 Hi there,
 
 Within our Cassandra cluster, we're observing, on occasion, one or two 
 nodes at a time becoming partially unresponsive.
 
 We're running 2.1.7 across the entire cluster.
 
 nodetool still reports the node as being healthy, and it does respond to 
 some local queries; however, the CPU is pegged at 100%. One common thread 
 (heh) each time this happens is that there always seems to be one of more 
 compaction threads running (via nodetool tpstats), and some appear to be 
 stuck (active count doesn't change, pending count doesn't decrease). A 
 request for compactionstats hangs with no response.
 
 Each time we've seen this, the only thing that appears to resolve the 
 issue is a restart of the Cassandra process; the restart does not appear 
 to be clean, and requires one or more attempts (or a -9 on occasion).
 
 There does not seem to be any pattern to what machines are affected; the 
 nodes thus far have been different instances on different physical 
 machines and on different racks.
 
 Has anyone seen this before? Alternatively, when this happens again, what 
 data can we collect that would help with the debugging process (in 
 addition to tpstats)?
 
 Thanks in advance,
 
 Bryan
 
 
 
 -- 
 Aiman Parvaiz
 Lead Systems Architect
 ai...@flipagram.com
 cell: 213-300-6377
 http://flipagram.com/apz
 


Cassandra - Spark - Flume: best architecture for log analytics.

2015-07-22 Thread Renato Perini

Problem: Log analytics.

Solutions:
   1) Aggregating logs using Flume and storing the aggregations 
into Cassandra. Spark reads data from Cassandra, make some computations

and write the results in distinct tables, still in Cassandra.
   2) Aggregating logs using Flume to a sink, streaming data 
directly into Spark. Spark make some computations and store the results 
in Cassandra.

   3) *** your solution ***

Which is the best workflow for this task?
I would like to setup something flexible enough to allow me to use batch 
processing and realtime streaming without major fuss.


Thank you in advance.





Re: Cassandra compaction appears to stall, node becomes partially unresponsive

2015-07-22 Thread Robert Coli
On Wed, Jul 22, 2015 at 2:55 PM, Bryan Cheng br...@blockcypher.com wrote:

 nodetool still reports the node as being healthy, and it does respond to
 some local queries; however, the CPU is pegged at 100%. One common thread
 (heh) each time this happens is that there always seems to be one of more
 compaction threads running (via nodetool tpstats), and some appear to be
 stuck (active count doesn't change, pending count doesn't decrease). A
 request for compactionstats hangs with no response.


I've heard other reports of compaction appearing to stall in 2.1.7...
wondering if you're affected by any of these...

https://issues.apache.org/jira/browse/CASSANDRA-9577
or
https://issues.apache.org/jira/browse/CASSANDRA-9056 or
https://issues.apache.org/jira/browse/CASSANDRA-8243 (these should not be
in 2.1.7)

=Rob