Re: Cassandra Needs to Grow Up by Version Five!

2018-02-18 Thread Jeff Jirsa
Comments inline 


> On Feb 18, 2018, at 9:39 PM, Kenneth Brotman  
> wrote:
> 
> Cassandra feels like an unfinished program to me. The problem is not that 
> it’s open source or cutting edge.  It’s an open source cutting edge program 
> that lacks some of its basic functionality.  We are all stuck addressing 
> fundamental mechanical tasks for Cassandra because the basic code that would 
> do that part has not been contributed yet.
> 
There’s probably 2-3 reasons why here:

1) Historically the pmc has tried to keep the scope of the project very narrow. 
It’s a database. We don’t ship drivers. We don’t ship developer tools. We don’t 
ship fancy UIs. We ship a database. I think for the most part the narrow vision 
has been for the best, but maybe it’s time to reconsider some of the scope. 

Postgres will autovacuum to prevent wraparound (hopefully),  but everyone I 
know running Postgres uses flexible-freeze in cron - sometimes it’s ok to let 
the database have its opinions and let third party tools fill in the gaps.

2) Cassandra is, by definition, a database for large scale problems. Most of 
the companies working on/with it tend to be big companies. Big companies often 
have pre-existing automation that solved the stuff you consider fundamental 
tasks, so there’s probably nobody actively working on the solved problems that 
you may consider missing features - for many people they’re already solved.

3) It’s not nearly as basic as you think it is. Datastax seemingly had a 
multi-person team on opscenter, and while it was better than anything else 
around last time I used it (before it stopped supporting the OSS version), it 
left a lot to be desired. It’s probably 2-3 engineers working for a month  to 
have any sort of meaningful, reliable, mostly trivial cluster-managing UI, and 
I can think of about 10 JIRAs I’d rather see that time be spent on first. 

> Ease of use issues need to be given much more attention.  For an 
> administrator, the ease of use of Cassandra is very poor. 
> 
> Furthermore, currently Cassandra is an idiot.  We have to do everything for 
> Cassandra. Contrast that with the fact that we are in the dawn of artificial 
> intelligence.
> 

And for everything you think is obvious, there’s a 50% chance someone else will 
have already solved differently, and your obvious new solution will be seen as 
an inconvenient assumption and complexity they won’t appreciate. Open source 
projects get to walk a fine line of trying to be useful without making too many 
assumptions, being “too” opinionated, or overstepping bounds. We may be too 
conservative, but it’s very easy to go too far in the opposite direction. 

> Software exists to automate tasks for humans, not mechanize humans to 
> administer tasks for a database.  I’m an engineering type.  My job is to 
> apply science and technology to solve real world problems.  And that’s where 
> I need an organization’s I.T. talent to focus; not in crank starting an 
> unfinished database.
> 

And that’s why nobody’s done it - we all have bigger problems we’re being paid 
to solve, and nobody’s felt it necessary. Because it’s not necessary, it’s 
nice, but not required.

> For example, I should be able to go to any node, replace the Cassandra.yaml 
> file and have a prompt on the display ask me if I want to update all the yaml 
> files across the cluster.  I shouldn’t have to manually modify yaml files on 
> each node or have to create a script for some third party automation tool to 
> do it. 
> 
I don’t see this ever happening.  Your config management already pushes files 
around your infrastructure, Cassandra doesn’t need to do it. 

> I should not have to turn off service, clear directories, restart service in 
> coordination with the other nodes.  It’s already a computer system.  It can 
> do those things on its own.
> 

The only time you should be doing this is when you’re wiping nodes from failed 
bootstrap, and that stopped being required in 2.2.
> How about read repair.  First there is something wrong with the name.  Maybe 
> it should be called Consistency Repair.  An administrator shouldn’t have to 
> do anything.  It should be a behavior of Cassandra that is programmed in. It 
> should consider the GC setting of each node, calculate how often it has to 
> run repair, when it should run it so all the nodes aren’t trying at the same 
> time and when other circumstances indicate it should also run it.
> 
There’s a good argument to be made that something like Reaper should be shipped 
with Cassandra. There’s another good argument that most tools like this end up 
needing some sort of leader election for scheduling and that goes against a lot 
of the fundamental assumptions in Cassandra (all nodes are equal, etc) - 
solving that problem is probably at least part of why you haven’t seen them 
built into the db. “Leader election is easy” you’ll say, and I’ll laugh and 
tell you about users I know who have DCs go 

Cassandra Needs to Grow Up by Version Five!

2018-02-18 Thread Kenneth Brotman
Cassandra feels like an unfinished program to me.  The problem is not that
it's open source or cutting edge.  It's an open source cutting edge program
that lacks some of its basic functionality.  We are all stuck addressing
fundamental mechanical tasks for Cassandra because the basic code that would
do that part has not been contributed yet.

Ease of use issues need to be given much more attention.  For an
administrator, the ease of use of Cassandra is very poor.  

Furthermore, currently Cassandra is an idiot.  We have to do everything for
Cassandra. Contrast that with the fact that we are in the dawn of artificial
intelligence.

Software exists to automate tasks for humans, not mechanize humans to
administer tasks for a database.  I'm an engineering type.  My job is to
apply science and technology to solve real world problems.  And that's where
I need an organization's I.T. talent to focus; not in crank starting an
unfinished database.

For example, I should be able to go to any node, replace the Cassandra.yaml
file and have a prompt on the display ask me if I want to update all the
yaml files across the cluster.  I shouldn't have to manually modify yaml
files on each node or have to create a script for some third party
automation tool to do it.  

I should not have to turn off service, clear directories, restart service in
coordination with the other nodes.  It's already a computer system.  It can
do those things on its own.

How about read repair.  First there is something wrong with the name.  Maybe
it should be called Consistency Repair.  An administrator shouldn't have to
do anything.  It should be a behavior of Cassandra that is programmed in. It
should consider the GC setting of each node, calculate how often it has to
run repair, when it should run it so all the nodes aren't trying at the same
time and when other circumstances indicate it should also run it.

Certificate management should be automated.

Cluster wide management should be a big theme in any next major release.
What is a major release?  How many major releases could a program have
before all the coding for basic stuff like installation, configuration and
maintenance is included!

Finish the basic coding of Cassandra, make it easy to use for
administrators, make is smart, add cluster wide management.  Keep Cassandra
competitive or it will soon be the old Model T we all remember fondly.

I ask the Committee to compile a list of all such items, make a plan, and
commit to including the completed and tested code as part of major release
5.0.  I further ask that release 4.0 not be delayed and then there be an
unusually short skip to version 5.0. 

Kenneth Brotman



Re: Cassandra cluster: could not reach linear scalability

2018-02-18 Thread Rahul Singh
If that is the case you could also try to run more stress from another machine 
as well.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Feb 18, 2018, 2:37 PM -0500, Jeff Jirsa , wrote:
> Stress client may be cpu bound as well
>
> --
> Jeff Jirsa
>
>
> On Feb 18, 2018, at 7:40 AM, onmstester onmstester  
> wrote:
>
> > I'm running tests on separate machine (not member of the cluster)
> > I'm using the default data model of cassandra-stress tool : keyspace1 and 
> > table: standard1. nothing special on network or data traffic. Network 
> > capable of 1 G and tested it with ipperf.
> > iftop shows maximum of 48Mbit traffic between nodes in cluster.
> > Have not seen any warning in log files.
> > I'm monitoring cassandra during runtime using jvisual vm and never saw any 
> > GC chokepoints, cpu is below 40% always. I just cant understand why 
> > cassandra is limmiting the throughput?!
> > using top, fps and write per seconds are not showing any problems
> >
> > Sent using Zoho Mail
> >
> >
> >  On Sun, 18 Feb 2018 18:42:48 +0330 Rahul Singh 
> >  wrote 
> >
> > > Got it.
> > >
> > > Here are some other questions.
> > >
> > > Are you running the test on separate machine or one of the cluster 
> > > members?
> > >
> > > When configuring Cassandra stress what data model did you end up using? ( 
> > > do you see Data or traffic skew?)
> > >
> > > Do you see any wide partitions or Tombstone warnings on either node.
> > >
> > > Have you visualized the GC logs using something like VisualVM or hubspots 
> > > GC visualizer? This is to see if there are chokepoints in the GC cycle.
> > >
> > >
> > > --
> > > Rahul Singh
> > > rahul.si...@anant.us
> > >
> > > Anant Corporation
> > >
> > > On Feb 18, 2018, 9:23 AM -0500, onmstester onmstester 
> > > , wrote:
> > >
> > > > But monitoring cassandra with jmx using jvisualVM shows no problem, 
> > > > less than 30% of heap size used
> > > >
> > > > Sent using Zoho Mail
> > > >
> > > >
> > > >  On Sun, 18 Feb 2018 17:26:59 +0330 Rahul Singh 
> > > >  wrote 
> > > >
> > > > > You don’t don’t have enough memory. That’s just a start.
> > > > >
> > > > > --
> > > > > Rahul Singh
> > > > > rahul.si...@anant.us
> > > > >
> > > > > Anant Corporation
> > > > >
> > > > > On Feb 18, 2018, 6:29 AM -0500, onmstester onmstester 
> > > > > , wrote:
> > > > >
> > > > > > I've configured a simple cluster using two PC with identical spec:
> > > > > >  cpu core i5
> > > > > >   RAM: 8GB ddr3
> > > > > >   Disk: 1TB 5400rpm
> > > > > >   Network: 1 G (I've test it with iperf, it really is!)
> > > > > >
> > > > > > using the common configs described in many sites including datastax 
> > > > > > itself:
> > > > > > cluster_name: 'MyCassandraCluster'
> > > > > > num_tokens: 256
> > > > > > seed_provider:
> > > > > >  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
> > > > > >parameters:
> > > > > > - seeds: "192.168.1.1,192.168.1.2"
> > > > > > listen_address:
> > > > > > rpc_address: 0.0.0.0
> > > > > > endpoint_snitch: GossipingPropertyFileSnitch
> > > > > >
> > > > > > Running stress tool:
> > > > > > cassandra-stress write n=100 -rate threads=1000 -mode native 
> > > > > > cql3 -node 192.168.1.1,192.168.1.2
> > > > > >
> > > > > > Over each node it shows 39 K writes/seconds, but running the same 
> > > > > > stress tool command on cluster of both nodes shows 45 K 
> > > > > > writes/seconds. I've done all the tuning mentioned by apache and 
> > > > > > datastax. There are many use cases on the net proving Cassandra 
> > > > > > linear Scalability So what is wrong with my cluster?
> > > > > >
> > > > > > Sent using Zoho Mail
> > > > > >
> > > >
> >
> >


Re: Cassandra data model too many table

2018-02-18 Thread Jeff Jirsa
You’re basically looking to query and aggregate the data arbitrarily - you may 
have better luck using spark or solr pointing to a single backing table in 
Cassandra 



-- 
Jeff Jirsa


> On Feb 18, 2018, at 3:38 AM, onmstester onmstester  
> wrote:
> 
> I have a single structured row as input with rate of 10K per seconds. Each 
> row has 20 columns. Some queries should be answered on these inputs. Because 
> most of queries needs different where, group by or orderby, The final data 
> model ended up like this:
> primary key for table of query1 : ((column1,column2),column3,column4)
> primary key for table of query2 : ((column3,column4),column2,column1)
> and so on
> 
> I am aware of the limit in number of tables in cassandra data model (200 is 
> warning and 500 would fail) Because for every input row i should do an insert 
> in every table, the final write per seconds became big * big data!:
> 
> write per seconds = 10K (input) * number of tables (queries) * replication 
> factor
> 
> The main question: am i in the right path? is this normal to have a table for 
> every query even when the input rate is already so high? Shouldn't i use 
> something like spark or hadoop upon instead of relying on bare datamodel Or 
> event Hbase instead of cassandra?
> 
> 
> Sent using Zoho Mail
> 
> 
> 
> 


Re: Cassandra cluster: could not reach linear scalability

2018-02-18 Thread Jeff Jirsa
Stress client may be cpu bound as well

-- 
Jeff Jirsa


> On Feb 18, 2018, at 7:40 AM, onmstester onmstester  
> wrote:
> 
> I'm running tests on separate machine (not member of the cluster)
> I'm using the default data model of cassandra-stress tool : keyspace1 and 
> table: standard1. nothing special on network or data traffic. Network capable 
> of 1 G and tested it with ipperf.
> iftop shows maximum of 48Mbit traffic between nodes in cluster.
> Have not seen any warning in log files.
> I'm monitoring cassandra during runtime using jvisual vm and never saw any GC 
> chokepoints, cpu is below 40% always. I just cant understand why cassandra is 
> limmiting the throughput?!
> using top, fps and write per seconds are not showing any problems
> 
> Sent using Zoho Mail
> 
> 
> 
>  On Sun, 18 Feb 2018 18:42:48 +0330 Rahul Singh 
>  wrote 
> 
> Got it.
> 
> Here are some other questions.
> 
> Are you running the test on separate machine or one of the cluster members?
> 
> When configuring Cassandra stress what data model did you end up using? ( do 
> you see Data or traffic skew?)
> 
> Do you see any wide partitions or Tombstone warnings on either node.
> 
> Have you visualized the GC logs using something like VisualVM or hubspots GC 
> visualizer? This is to see if there are chokepoints in the GC cycle.
> 
> 
> --
> Rahul Singh
> rahul.si...@anant.us
> 
> Anant Corporation
> 
> On Feb 18, 2018, 9:23 AM -0500, onmstester onmstester , 
> wrote: 
> 
> But monitoring cassandra with jmx using jvisualVM shows no problem, less than 
> 30% of heap size used
> 
> Sent using Zoho Mail
> 
> 
> 
>  On Sun, 18 Feb 2018 17:26:59 +0330 Rahul Singh 
>  wrote 
> 
> You don’t don’t have enough memory. That’s just a start.
> 
> --
> Rahul Singh
> rahul.si...@anant.us
> 
> Anant Corporation
> 
> On Feb 18, 2018, 6:29 AM -0500, onmstester onmstester , 
> wrote:
> 
> I've configured a simple cluster using two PC with identical spec:
> 
>   cpu core i5
>RAM: 8GB ddr3
>Disk: 1TB 5400rpm
>Network: 1 G (I've test it with iperf, it really is!)
> 
> using the common configs described in many sites including datastax itself:
> 
> cluster_name: 'MyCassandraCluster'
> num_tokens: 256
> seed_provider:
>   - class_name: org.apache.cassandra.locator.SimpleSeedProvider
> parameters:
>  - seeds: "192.168.1.1,192.168.1.2"
> listen_address:
> rpc_address: 0.0.0.0
> endpoint_snitch: GossipingPropertyFileSnitch
> 
> Running stress tool:
> 
> cassandra-stress write n=100 -rate threads=1000 -mode native cql3 -node 
> 192.168.1.1,192.168.1.2
> 
> Over each node it shows 39 K writes/seconds, but running the same stress tool 
> command on cluster of both nodes shows 45 K writes/seconds. I've done all the 
> tuning mentioned by apache and datastax. There are many use cases on the net 
> proving Cassandra linear Scalability So what is wrong with my cluster?
> 
> 
> Sent using Zoho Mail
> 
> 
> 
> 
> 


Re: Cassandra cluster: could not reach linear scalability

2018-02-18 Thread onmstester onmstester
I'm running tests on separate machine (not member of the cluster)

I'm using the default data model of cassandra-stress tool : keyspace1 and 
table: standard1. nothing special on network or data traffic. Network capable 
of 1 G and tested it with ipperf.

iftop shows maximum of 48Mbit traffic between nodes in cluster.

Have not seen any warning in log files.

I'm monitoring cassandra during runtime using jvisual vm and never saw any GC 
chokepoints, cpu is below 40% always. I just cant understand why cassandra is 
limmiting the throughput?!

using top, fps and write per seconds are not showing any problems



Sent using Zoho Mail






 On Sun, 18 Feb 2018 18:42:48 +0330 Rahul Singh 
rahul.xavier.si...@gmail.com wrote 




Got it.

 

 Here are some other questions.

 

 Are you running the test on separate machine or one of the cluster members?

 

 When configuring Cassandra stress what data model did you end up using? ( do 
you see Data or traffic skew?)

 

 Do you see any wide partitions or Tombstone warnings on either node.

 

 Have you visualized the GC logs using something like VisualVM or hubspots GC 
visualizer? This is to see if there are chokepoints in the GC cycle.

 




--

 Rahul Singh

 rahul.si...@anant.us

 

 Anant Corporation




On Feb 18, 2018, 9:23 AM -0500, onmstester onmstester 
onmstes...@zoho.com, wrote: 





But monitoring cassandra with jmx using jvisualVM shows no problem, less than 
30% of heap size used



Sent using Zoho Mail






 On Sun, 18 Feb 2018 17:26:59 +0330 Rahul Singh 
rahul.xavier.si...@gmail.com wrote 




You don’t don’t have enough memory. That’s just a start.



--

Rahul Singh

rahul.si...@anant.us



Anant Corporation




On Feb 18, 2018, 6:29 AM -0500, onmstester onmstester 
onmstes...@zoho.com, wrote:





I've configured a simple cluster using two PC with identical spec:

 cpu core i5 RAM: 8GB ddr3 Disk: 1TB 5400rpm Network: 1 G (I've test it with 
iperf, it really is!) 
using the common configs described in many sites including datastax itself:

cluster_name: 'MyCassandraCluster' num_tokens: 256 seed_provider: - class_name: 
org.apache.cassandra.locator.SimpleSeedProvider parameters: - seeds: 
"192.168.1.1,192.168.1.2" listen_address: rpc_address: 0.0.0.0 endpoint_snitch: 
GossipingPropertyFileSnitch 
Running stress tool:

cassandra-stress write n=100 -rate threads=1000 -mode native cql3 -node 
192.168.1.1,192.168.1.2 
Over each node it shows 39 K writes/seconds, but running the same stress tool 
command on cluster of both nodes shows 45 K writes/seconds. I've done all the 
tuning mentioned by apache and datastax. There are many use cases on the net 
proving Cassandra linear Scalability So what is wrong with my cluster?



Sent using Zoho Mail















Re: SSTableLoader Question

2018-02-18 Thread Rahul Singh
If you don’t have access to the file you don’t have access to the file. I’ve 
seen this issue several times. It’s he easiest low hanging fruit to resolve. So 
figure it out and make sure that it’s Cassandra.Cassandra from root to he Data 
folder and either run as root or sudo it.

If it’s compacted it won’t be there so you won’t have the file. I’m not aware 
of this event being communicated to Sstableloader via SEDA. Besides, the 
sstable that you are loading SHOULD not be live. If you at streaming a life 
sstable, it means you are using sstableloader not as it is designed to be used 
- which is with static files.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Feb 18, 2018, 9:22 AM -0500, shalom sagges , wrote:
> Not really sure with which user I ran it (root or cassandra), although I 
> don't understand why a permission issue will generate a File not Found 
> exception?
>
> And in general, what if a file is being streamed and got compacted before the 
> streaming ended. Does Cassandra know how to handle this?
>
> Thanks!
>
> > On Sun, Feb 18, 2018 at 3:58 PM, Rahul Singh  
> > wrote:
> > > Check permissions maybe? Who owns the files vs. who is running 
> > > sstableloader.
> > >
> > > --
> > > Rahul Singh
> > > rahul.si...@anant.us
> > >
> > > Anant Corporation
> > >
> > > On Feb 18, 2018, 4:26 AM -0500, shalom sagges , 
> > > wrote:
> > > > Hi All,
> > > >
> > > > C* version 2.0.14.
> > > >
> > > > I was loading some data to another cluster using SSTableLoader. The 
> > > > streaming failed with the following error:
> > > >
> > > >
> > > > Streaming error occurred
> > > > java.lang.RuntimeException: java.io.FileNotFoundException: 
> > > > /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file 
> > > > or directory)
> > > >     at 
> > > > org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:59)
> > > >     at 
> > > > org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:1409)
> > > >     at 
> > > > org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:55)
> > > >     at 
> > > > org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:59)
> > > >     at 
> > > > org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42)
> > > >     at 
> > > > org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
> > > >     at 
> > > > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)
> > > >     at 
> > > > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)
> > > >     at java.lang.Thread.run(Thread.java:722)
> > > > Caused by: java.io.FileNotFoundException: 
> > > > /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file 
> > > > or directory)
> > > >     at java.io.RandomAccessFile.open(Native Method)
> > > >     at java.io.RandomAccessFile.(RandomAccessFile.java:233)
> > > >     at 
> > > > org.apache.cassandra.io.util.RandomAccessReader.(RandomAccessReader.java:58)
> > > >     at 
> > > > org.apache.cassandra.io.compress.CompressedRandomAccessReader.(CompressedRandomAccessReader.java:76)
> > > >     at 
> > > > org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:55)
> > > >     ... 8 more
> > > >  WARN 18:31:35,938 [Stream #7243efb0-1262-11e8-8562-d19d5fe7829c] 
> > > > Stream failed
> > > >
> > > >
> > > >
> > > > Did I miss something when running the load? Was the file suddenly 
> > > > missing due to compaction?
> > > > If so, did I need to disable auto compaction or stop the service 
> > > > beforehand? (didn't find any reference to compaction in the docs)
> > > >
> > > > I know it's an old version, but I didn't find any related bugs on "File 
> > > > not found" exceptions.
> > > >
> > > > Thanks!
> > > >
> > > >
>


Re: Cassandra cluster: could not reach linear scalability

2018-02-18 Thread Rahul Singh
Got it.

Here are some other questions.

Are you running the test on separate machine or one of the cluster members?

When configuring Cassandra stress what data model did you end up using? ( do 
you see Data or traffic skew?)

Do you see any wide partitions or Tombstone warnings on either node.

Have you visualized the GC logs using something like VisualVM or hubspots GC 
visualizer? This is to see if there are chokepoints in the GC cycle.


--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Feb 18, 2018, 9:23 AM -0500, onmstester onmstester , 
wrote:
> But monitoring cassandra with jmx using jvisualVM shows no problem, less than 
> 30% of heap size used
>
> Sent using Zoho Mail
>
>
>  On Sun, 18 Feb 2018 17:26:59 +0330 Rahul Singh 
>  wrote 
>
> > You don’t don’t have enough memory. That’s just a start.
> >
> > --
> > Rahul Singh
> > rahul.si...@anant.us
> >
> > Anant Corporation
> >
> > On Feb 18, 2018, 6:29 AM -0500, onmstester onmstester 
> > , wrote:
> >
> > > I've configured a simple cluster using two PC with identical spec:
> > >  cpu core i5
> > >   RAM: 8GB ddr3
> > >   Disk: 1TB 5400rpm
> > >   Network: 1 G (I've test it with iperf, it really is!)
> > >
> > > using the common configs described in many sites including datastax 
> > > itself:
> > > cluster_name: 'MyCassandraCluster'
> > > num_tokens: 256
> > > seed_provider:
> > >  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
> > >parameters:
> > > - seeds: "192.168.1.1,192.168.1.2"
> > > listen_address:
> > > rpc_address: 0.0.0.0
> > > endpoint_snitch: GossipingPropertyFileSnitch
> > >
> > > Running stress tool:
> > > cassandra-stress write n=100 -rate threads=1000 -mode native cql3 
> > > -node 192.168.1.1,192.168.1.2
> > >
> > > Over each node it shows 39 K writes/seconds, but running the same stress 
> > > tool command on cluster of both nodes shows 45 K writes/seconds. I've 
> > > done all the tuning mentioned by apache and datastax. There are many use 
> > > cases on the net proving Cassandra linear Scalability So what is wrong 
> > > with my cluster?
> > >
> > > Sent using Zoho Mail
> > >
>
>


Re: Cassandra cluster: could not reach linear scalability

2018-02-18 Thread onmstester onmstester
But monitoring cassandra with jmx using jvisualVM shows no problem, less than 
30% of heap size used


Sent using Zoho Mail






 On Sun, 18 Feb 2018 17:26:59 +0330 Rahul Singh 
rahul.xavier.si...@gmail.com wrote 




You don’t don’t have enough memory. That’s just a start.



--

 Rahul Singh

 rahul.si...@anant.us

 

 Anant Corporation




On Feb 18, 2018, 6:29 AM -0500, onmstester onmstester 
onmstes...@zoho.com, wrote: 





I've configured a simple cluster using two PC with identical spec:

 cpu core i5 RAM: 8GB ddr3 Disk: 1TB 5400rpm Network: 1 G (I've test it with 
iperf, it really is!) 
using the common configs described in many sites including datastax itself:

cluster_name: 'MyCassandraCluster' num_tokens: 256 seed_provider: - class_name: 
org.apache.cassandra.locator.SimpleSeedProvider parameters: - seeds: 
"192.168.1.1,192.168.1.2" listen_address: rpc_address: 0.0.0.0 endpoint_snitch: 
GossipingPropertyFileSnitch 
Running stress tool:

cassandra-stress write n=100 -rate threads=1000 -mode native cql3 -node 
192.168.1.1,192.168.1.2 
Over each node it shows 39 K writes/seconds, but running the same stress tool 
command on cluster of both nodes shows 45 K writes/seconds. I've done all the 
tuning mentioned by apache and datastax. There are many use cases on the net 
proving Cassandra linear Scalability So what is wrong with my cluster?



Sent using Zoho Mail











Re: SSTableLoader Question

2018-02-18 Thread shalom sagges
Not really sure with which user I ran it (root or cassandra), although I
don't understand why a permission issue will generate a File not Found
exception?

And in general, what if a file is being streamed and got compacted before
the streaming ended. Does Cassandra know how to handle this?

Thanks!

On Sun, Feb 18, 2018 at 3:58 PM, Rahul Singh 
wrote:

> Check permissions maybe? Who owns the files vs. who is running
> sstableloader.
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On Feb 18, 2018, 4:26 AM -0500, shalom sagges ,
> wrote:
>
> Hi All,
>
> C* version 2.0.14.
>
> I was loading some data to another cluster using SSTableLoader. The
> streaming failed with the following error:
>
>
> Streaming error occurred
> java.lang.RuntimeException: java.io.*FileNotFoundException*:
> /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file
> or directory)
> at org.apache.cassandra.io.compress.CompressedRandomAccessReade
> r.open(CompressedRandomAccessReader.java:59)
> at org.apache.cassandra.io.sstable.SSTableReader.openDataReader
> (SSTableReader.java:1409)
> at org.apache.cassandra.streaming.compress.CompressedStreamWrit
> er.write(CompressedStreamWriter.java:55)
> at org.apache.cassandra.streaming.messages.OutgoingFileMessage$
> 1.serialize(OutgoingFileMessage.java:59)
> at org.apache.cassandra.streaming.messages.OutgoingFileMessage$
> 1.serialize(OutgoingFileMessage.java:42)
> at org.apache.cassandra.streaming.messages.StreamMessage.
> serialize(StreamMessage.java:45)
> at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMes
> sageHandler.sendMessage(ConnectionHandler.java:339)
> at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMes
> sageHandler.run(ConnectionHandler.java:311)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: java.io.*FileNotFoundException*: /data1/keyspace1/table1/
> keyspace1-table1-jb-65174-Data.db (No such file or directory)
> at java.io.RandomAccessFile.open(Native Method)
> at java.io.RandomAccessFile.(RandomAccessFile.java:233)
> at org.apache.cassandra.io.util.RandomAccessReader.(Rando
> mAccessReader.java:58)
> at org.apache.cassandra.io.compress.CompressedRandomAccessReade
> r.(CompressedRandomAccessReader.java:76)
> at org.apache.cassandra.io.compress.CompressedRandomAccessReade
> r.open(CompressedRandomAccessReader.java:55)
> ... 8 more
>  WARN 18:31:35,938 [Stream #7243efb0-1262-11e8-8562-d19d5fe7829c] Stream
> failed
>
>
>
> Did I miss something when running the load? Was the file suddenly missing
> due to compaction?
> If so, did I need to disable auto compaction or stop the service
> beforehand? (didn't find any reference to compaction in the docs)
>
> I know it's an old version, but I didn't find any related bugs on "File
> not found" exceptions.
>
> Thanks!
>
>
>


Re: SSTableLoader Question

2018-02-18 Thread Rahul Singh
Check permissions maybe? Who owns the files vs. who is running sstableloader.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Feb 18, 2018, 4:26 AM -0500, shalom sagges , wrote:
> Hi All,
>
> C* version 2.0.14.
>
> I was loading some data to another cluster using SSTableLoader. The streaming 
> failed with the following error:
>
>
> Streaming error occurred
> java.lang.RuntimeException: java.io.FileNotFoundException: 
> /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file or 
> directory)
>     at 
> org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:59)
>     at 
> org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:1409)
>     at 
> org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:55)
>     at 
> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:59)
>     at 
> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42)
>     at 
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
>     at 
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)
>     at 
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)
>     at java.lang.Thread.run(Thread.java:722)
> Caused by: java.io.FileNotFoundException: 
> /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file or 
> directory)
>     at java.io.RandomAccessFile.open(Native Method)
>     at java.io.RandomAccessFile.(RandomAccessFile.java:233)
>     at 
> org.apache.cassandra.io.util.RandomAccessReader.(RandomAccessReader.java:58)
>     at 
> org.apache.cassandra.io.compress.CompressedRandomAccessReader.(CompressedRandomAccessReader.java:76)
>     at 
> org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:55)
>     ... 8 more
>  WARN 18:31:35,938 [Stream #7243efb0-1262-11e8-8562-d19d5fe7829c] Stream 
> failed
>
>
>
> Did I miss something when running the load? Was the file suddenly missing due 
> to compaction?
> If so, did I need to disable auto compaction or stop the service beforehand? 
> (didn't find any reference to compaction in the docs)
>
> I know it's an old version, but I didn't find any related bugs on "File not 
> found" exceptions.
>
> Thanks!
>
>


Re: Cassandra cluster: could not reach linear scalability

2018-02-18 Thread Rahul Singh
You don’t don’t have enough memory. That’s just a start.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Feb 18, 2018, 6:29 AM -0500, onmstester onmstester , 
wrote:
> I've configured a simple cluster using two PC with identical spec:
>  cpu core i5
>   RAM: 8GB ddr3
>   Disk: 1TB 5400rpm
>   Network: 1 G (I've test it with iperf, it really is!)
>
> using the common configs described in many sites including datastax itself:
> cluster_name: 'MyCassandraCluster'
> num_tokens: 256
> seed_provider:
>  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
>parameters:
> - seeds: "192.168.1.1,192.168.1.2"
> listen_address:
> rpc_address: 0.0.0.0
> endpoint_snitch: GossipingPropertyFileSnitch
>
> Running stress tool:
> cassandra-stress write n=100 -rate threads=1000 -mode native cql3 -node 
> 192.168.1.1,192.168.1.2
>
> Over each node it shows 39 K writes/seconds, but running the same stress tool 
> command on cluster of both nodes shows 45 K writes/seconds. I've done all the 
> tuning mentioned by apache and datastax. There are many use cases on the net 
> proving Cassandra linear Scalability So what is wrong with my cluster?
>
> Sent using Zoho Mail
>
>


Re: Cassandra data model too many table

2018-02-18 Thread Rahul Singh


What’s the root cause of this many queries? Is this because of multi tenancy or 
multiple processes ?

It’s possible to potentially logically group some of this data if you use 
collections / sets inside a column. That works if the data is of a similar 
structure of a similar query.

It’s “semi-normalization” where you are leveraging the collection / set as a 
way to store the structure and the table as a way to partition and cluster the 
data.

Potentially you’d need some “index” tables where you’d query it first to get 
the partitions you need. Would you benefit from creating separate logical 
clusters?

How much data do these queries return? If not a lot consider materializing the 
output into more general “cache” tables with set / collection columns when data 
is shoved when data is updated via triggers or spark.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Feb 18, 2018, 6:38 AM -0500, onmstester onmstester , 
wrote:
> I have a single structured row as input with rate of 10K per seconds. Each 
> row has 20 columns. Some queries should be answered on these inputs. Because 
> most of queries needs different where, group by or orderby, The final data 
> model ended up like this:
> >primary key for table of query1 : ((column1,column2),column3,column4)
> >primary key for table of query2 : ((column3,column4),column2,column1)
> >and so on
> >
> > I am aware of the limit in number of tables in cassandra data model (200 is 
> > warning and 500 would fail) Because for every input row i should do an 
> > insert in every table, the final write per seconds became big * big data!:
> > write per seconds = 10K (input) * number of tables (queries) * replication 
> > factor
> >
> > The main question: am i in the right path? is this normal to have a table 
> > for every query even when the input rate is already so high? Shouldn't i 
> > use something like spark or hadoop upon instead of relying on bare 
> > datamodel Or event Hbase instead of cassandra?
> >
> > Sent using Zoho Mail
> >
>
>


Cassandra data model too many table

2018-02-18 Thread onmstester onmstester
I have a single structured row as input with rate of 10K per seconds. Each row 
has 20 columns. Some queries should be answered on these inputs. Because most 
of queries needs different where, group by or orderby, The final data model 
ended up like this:

 primary key for table of query1 : ((column1,column2),column3,column4) primary 
key for table of query2 : ((column3,column4),column2,column1) and so on 
I am aware of the limit in number of tables in cassandra data model (200 is 
warning and 500 would fail) Because for every input row i should do an insert 
in every table, the final write per seconds became big * big data!:

write per seconds = 10K (input) * number of tables (queries) * replication 
factor 
The main question: am i in the right path? is this normal to have a table for 
every query even when the input rate is already so high? Shouldn't i use 
something like spark or hadoop upon instead of relying on bare datamodel Or 
event Hbase instead of cassandra?



Sent using Zoho Mail












Cassandra cluster: could not reach linear scalability

2018-02-18 Thread onmstester onmstester
I've configured a simple cluster using two PC with identical spec:

 cpu core i5 RAM: 8GB ddr3 Disk: 1TB 5400rpm Network: 1 G (I've test it with 
iperf, it really is!) 
using the common configs described in many sites including datastax itself:

cluster_name: 'MyCassandraCluster' num_tokens: 256 seed_provider: - class_name: 
org.apache.cassandra.locator.SimpleSeedProvider parameters: - seeds: 
"192.168.1.1,192.168.1.2" listen_address: rpc_address: 0.0.0.0 endpoint_snitch: 
GossipingPropertyFileSnitch 
Running stress tool:

cassandra-stress write n=100 -rate threads=1000 -mode native cql3 -node 
192.168.1.1,192.168.1.2 
Over each node it shows 39 K writes/seconds, but running the same stress tool 
command on cluster of both nodes shows 45 K writes/seconds. I've done all the 
tuning mentioned by apache and datastax. There are many use cases on the net 
proving Cassandra linear Scalability So what is wrong with my cluster?



Sent using Zoho Mail







SSTableLoader Question

2018-02-18 Thread shalom sagges
Hi All,

C* version 2.0.14.

I was loading some data to another cluster using SSTableLoader. The
streaming failed with the following error:


Streaming error occurred
java.lang.RuntimeException: java.io.*FileNotFoundException*:
/data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file or
directory)
at org.apache.cassandra.io.compress.CompressedRandomAccessReader.
open(CompressedRandomAccessReader.java:59)
at org.apache.cassandra.io.sstable.SSTableReader.
openDataReader(SSTableReader.java:1409)
at org.apache.cassandra.streaming.compress.
CompressedStreamWriter.write(CompressedStreamWriter.java:55)
at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.
serialize(OutgoingFileMessage.java:59)
at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.
serialize(OutgoingFileMessage.java:42)
at org.apache.cassandra.streaming.messages.StreamMessage.serialize(
StreamMessage.java:45)
at org.apache.cassandra.streaming.ConnectionHandler$
OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)
at org.apache.cassandra.streaming.ConnectionHandler$
OutgoingMessageHandler.run(ConnectionHandler.java:311)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.*FileNotFoundException*:
/data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file or
directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at org.apache.cassandra.io.util.RandomAccessReader.(
RandomAccessReader.java:58)
at org.apache.cassandra.io.compress.CompressedRandomAccessReader.<
init>(CompressedRandomAccessReader.java:76)
at org.apache.cassandra.io.compress.CompressedRandomAccessReader.
open(CompressedRandomAccessReader.java:55)
... 8 more
 WARN 18:31:35,938 [Stream #7243efb0-1262-11e8-8562-d19d5fe7829c] Stream
failed



Did I miss something when running the load? Was the file suddenly missing
due to compaction?
If so, did I need to disable auto compaction or stop the service
beforehand? (didn't find any reference to compaction in the docs)

I know it's an old version, but I didn't find any related bugs on "File not
found" exceptions.

Thanks!