RE: Periodic Anti-Entropy repair
We have encountered issues of very long running nodetool repair when we ran it node by node on really large dataset. It even kept on running for a week in some cases. IMO the strategy you are choosing of repairing nodes by –st and –et is good one and does the same work in small increments logs of which can be analyzed easily. In addition my suggestion would be to use –h option to connect to the node from outside, and take care of the fact that node tool ring will give even –ve token ranges in the ‘for’ loop. You can go from -2^63 to first ring value, then from (there+1) to next token value. Better not use i+=2 because token values are not necessarily even numbers. Regards, Tarun From: Anuj Wadehra [mailto:anujw_2...@yahoo.co.in] Sent: Sunday, May 24, 2015 6:31 AM To: user@cassandra.apache.org Subject: Re: Periodic Anti-Entropy repair You should use nodetool repair -pr on every node to make sure that each range is repaired only once. Thanks Anuj Wadehra Sent from Yahoo Mail on Androidhttps://overview.mail.yahoo.com/mobile/?.src=Android From:Brice Argenson bargen...@gmail.commailto:bargen...@gmail.com Date:Sat, 23 May, 2015 at 12:31 am Subject:Periodic Anti-Entropy repair Hi everyone, We are currently migrating from DSE to Apache Cassandra and we would like to put in place an automatic and periodic nodetool repair execution to replace the one executed by OpsCenter. I wanted to create a script / service that would run something like that: token_rings = `nodetool ring | awk '{print $8}’` for(int i = 0; i token_rings.length; i += 2) { `nodetool repair -st token_rings[i] -et token_rings[i+1]` } That script / service would run every week (our GCGrace is 10 days) and would repair all the ranges of the ring one by one. I also looked a bit on Google and I found that script: https://github.com/BrianGallew/cassandra_range_repair It seems to do something equivalent but it also seems to run the repair node by node instead of the complete ring. From my understanding, that would mean that the script has to be run for every node of the cluster and that all token ranges would be repair as many time as the number of replicas containing it. Is there something I misunderstand? Which approach is better? How do you handle your Periodic Anti-Entropy Repairs? Thanks a lot!
Cluster imbalance caused due to #Num_Tokens
Hi, While setting up a cluster for our POC, when we installed Cassandra on the 1st node we gave num_tokens: 256 , while on next 2 nodes which were added later we left it blank in Cassandra.yaml. This made our cluster an unbalanced one with nodetool status showing 99% load on one server. Now even if I am setting up num tokens in the other 2 nodes as 256, its not seeming to effect. The wiki article http://wiki.apache.org/cassandra/VirtualNodes/Balance doesn't seem to provide steps to correct from this situation. I read that there was nodetool balance kind of command in Cassandra 0.7 but not anymore. UN Node3 23.72 MB 1 0.4% 41a71df-7e6c-40ab-902f-237697eaaf3e rack1 UN Node2 79.35 MB 1 0.5% 98c493b-f661-491e-9d1f-1803f859528b rack1 UN Node1 86.93 MB 256 99.1% a35ccca-556c-4f77-aa6d-7e3dad41ecf8 rack1 Is there something that we can do now balance the cluster? Regards, Tarun
RE: Can cqlsh COPY command be run through
Thanks. That was kind of a logical guess is was having on it. Thanks for confirming. From: DuyHai Doan [mailto:doanduy...@gmail.com] Sent: Wednesday, April 08, 2015 1:05 AM To: user@cassandra.apache.org Subject: Re: Can cqlsh COPY command be run through Short answer is no. Whenever you access the session object of the Java driver directly (using withSessionDo{...}), you bypass the data locality optimisation made by the connector On Sun, Apr 5, 2015 at 9:53 AM, Tiwari, Tarun tarun.tiw...@kronos.commailto:tarun.tiw...@kronos.com wrote: Hi, I am looking for, if the CQLSH COPY command be run using the spark scala program. Does it benefit from the parallelism achieved by spark. I am doing something like below: val conf = new SparkConf(true).setMaster(spark://Master-Host:7077) .setAppName(Load Cs Table using COPY TO) lazy val sc = new SparkContext(conf) import com.datastax.spark.connector.cql.CassandraConnector CassandraConnector(conf).withSessionDo { session = session.execute(truncate wfcdb.test_wfctotal;) session.execute(COPY wfcdb.test_wfctotal (wfctotalid, timesheetitemid, employeeid, durationsecsqty, wageamt, moneyamt, applydtm, laboracctid, paycodeid, startdtm, stimezoneid, adjstartdtm, adjapplydtm, enddtm, homeaccountsw, notpaidsw, wfcjoborgid, unapprovedsw, durationdaysqty, updatedtm, totaledversion, acctapprovalnum) FROM '/home/analytics/Documents/wfctotal.dat' WITH DELIMITER = '|' AND HEADER = true;) Regards, Tarun Tiwari | Workforce Analytics-ETL | Kronos India M: +91 9540 28 27 77 | Tel: +91 120 4015200tel:%2B91%20120%204015200 Kronos | Time Attendance • Scheduling • Absence Management • HR Payroll • Hiring • Labor Analytics Join Kronos on: kronos.comhttp://www.kronos.com/ | Facebookhttp://www.kronos.com/facebook | Twitterhttp://www.kronos.com/twitter | LinkedInhttp://www.kronos.com/linkedin | YouTubehttp://www.kronos.com/youtube
Can cqlsh COPY command be run through
Hi, I am looking for, if the CQLSH COPY command be run using the spark scala program. Does it benefit from the parallelism achieved by spark. I am doing something like below: val conf = new SparkConf(true).setMaster(spark://Master-Host:7077) .setAppName(Load Cs Table using COPY TO) lazy val sc = new SparkContext(conf) import com.datastax.spark.connector.cql.CassandraConnector CassandraConnector(conf).withSessionDo { session = session.execute(truncate wfcdb.test_wfctotal;) session.execute(COPY wfcdb.test_wfctotal (wfctotalid, timesheetitemid, employeeid, durationsecsqty, wageamt, moneyamt, applydtm, laboracctid, paycodeid, startdtm, stimezoneid, adjstartdtm, adjapplydtm, enddtm, homeaccountsw, notpaidsw, wfcjoborgid, unapprovedsw, durationdaysqty, updatedtm, totaledversion, acctapprovalnum) FROM '/home/analytics/Documents/wfctotal.dat' WITH DELIMITER = '|' AND HEADER = true;) Regards, Tarun Tiwari | Workforce Analytics-ETL | Kronos India M: +91 9540 28 27 77 | Tel: +91 120 4015200 Kronos | Time Attendance * Scheduling * Absence Management * HR Payroll * Hiring * Labor Analytics Join Kronos on: kronos.comhttp://www.kronos.com/ | Facebookhttp://www.kronos.com/facebook | Twitterhttp://www.kronos.com/twitter | LinkedInhttp://www.kronos.com/linkedin | YouTubehttp://www.kronos.com/youtube
RE: Getting NoClassDefFoundError for com/datastax/spark/connector/mapper/ColumnMapper
Yes it seems it was not taking the classpath for the Cassandra connector. Added it to driver class path argument but got into another error Used below command now spark-submit --class ldCassandraTable ./target/scala-2.10/merlin-spark-cassandra-poc_2.10-0.0.1.jar /home/analytics/Documents/test_wfctotal.dat test_wfctotal --driver-class-path /home/analytics/Installers/spark-cassandra-connector-1.1.1/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.1.1.jar and getting new error Spark assembly has been built with Hive, including Datanucleus jars on classpath 15/04/03 13:46:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable :/home/analytics/Installers/spark-cassandra-connector-1.1.1/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.1.1.jar:/home/analytics/Installers/spark-1.1.1/conf:/home/analytics/Installers/spark-1.1.1/assembly/target/scala-2.10/spark-assembly-1.1.1-hadoop1.0.4.jar:/home/analytics/Installers/spark-1.1.1/lib_managed/jars/datanucleus-rdbms-3.2.1.jar:/home/analytics/Installers/spark-1.1.1/lib_managed/jars/datanucleus-core-3.2.2.jar:/home/analytics/Installers/spark-1.1.1/lib_managed/jars/datanucleus-api-jdo-3.2.1.jar 15/04/03 13:46:46 WARN LoadSnappy: Snappy native library not loaded Records Loaded to 15/04/03 13:46:54 ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(NODE02.int.kronos.com,60755) not found From: Dave Brosius [mailto:dbros...@mebigfatguy.com] Sent: Friday, April 03, 2015 9:15 AM To: user@cassandra.apache.org Subject: Re: Getting NoClassDefFoundError for com/datastax/spark/connector/mapper/ColumnMapper This is what i meant by 'initial cause' Caused by: java.lang.ClassNotFoundException: com.datastax.spark.connector.mapper.ColumnMapper So it is in fact a classpath problem Here is the class in question https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector/mapper/ColumnMapper.scala Maybe it would be worthwhile to put this at the top of your main method System.out.println(System.getProperty( java.class.path); and show what that prints. What version of the cassandra and what version of the cassandra-spark connector are you using, btw? On 04/02/2015 11:16 PM, Tiwari, Tarun wrote: Sorry I was unable to reply for couple of days. I checked the error again and can’t see any other initial cause. Here is the full error that is coming. Exception in thread main java.lang.NoClassDefFoundError: com/datastax/spark/connector/mapper/ColumnMapper at ldCassandraTable.main(ld_Cassandra_tbl_Job.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:329) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: com.datastax.spark.connector.mapper.ColumnMapper at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) From: Dave Brosius [mailto:dbros...@mebigfatguy.com] Sent: Tuesday, March 31, 2015 8:46 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Getting NoClassDefFoundError for com/datastax/spark/connector/mapper/ColumnMapper Is there an 'initial cause' listed under that exception you gave? As NoClassDefFoundError is not exactly the same as ClassNotFoundException. It meant that ColumnMapper couldn't initialize it's static initializer, it could be because some other class couldn't be found, or it could be some other non classloader related error. On 2015-03-31 10:42, Tiwari, Tarun wrote: Hi Experts, I am getting java.lang.NoClassDefFoundError: com/datastax/spark/connector/mapper/ColumnMapper while running a app to load data to Cassandra table using the datastax spark connector Is there something else I need to import in the program or dependencies? RUNTIME ERROR: Exception in thread main java.lang.NoClassDefFoundError: com/datastax/spark/connector/mapper/ColumnMapper at ldCassandraTable.main(ld_Cassandra_tbl_Job.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Below is my scala program /*** ld_Cassandra_Table.scala ***/ import
RE: Getting NoClassDefFoundError for com/datastax/spark/connector/mapper/ColumnMapper
Sorry I was unable to reply for couple of days. I checked the error again and can’t see any other initial cause. Here is the full error that is coming. Exception in thread main java.lang.NoClassDefFoundError: com/datastax/spark/connector/mapper/ColumnMapper at ldCassandraTable.main(ld_Cassandra_tbl_Job.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:329) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: com.datastax.spark.connector.mapper.ColumnMapper at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) From: Dave Brosius [mailto:dbros...@mebigfatguy.com] Sent: Tuesday, March 31, 2015 8:46 PM To: user@cassandra.apache.org Subject: Re: Getting NoClassDefFoundError for com/datastax/spark/connector/mapper/ColumnMapper Is there an 'initial cause' listed under that exception you gave? As NoClassDefFoundError is not exactly the same as ClassNotFoundException. It meant that ColumnMapper couldn't initialize it's static initializer, it could be because some other class couldn't be found, or it could be some other non classloader related error. On 2015-03-31 10:42, Tiwari, Tarun wrote: Hi Experts, I am getting java.lang.NoClassDefFoundError: com/datastax/spark/connector/mapper/ColumnMapper while running a app to load data to Cassandra table using the datastax spark connector Is there something else I need to import in the program or dependencies? RUNTIME ERROR: Exception in thread main java.lang.NoClassDefFoundError: com/datastax/spark/connector/mapper/ColumnMapper at ldCassandraTable.main(ld_Cassandra_tbl_Job.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Below is my scala program /*** ld_Cassandra_Table.scala ***/ import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf import com.datastax.spark.connector import com.datastax.spark.connector._ object ldCassandraTable { def main(args: Array[String]) { val fileName = args(0) val tblName = args(1) val conf = new SparkConf(true).set(spark.cassandra.connection.host, MASTER HOST) .setMaster(MASTER URL) .setAppName(LoadCassandraTableApp) val sc = new SparkContext(conf) sc.addJar(/home/analytics/Installers/spark-cassandra-connector-1.1.1/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.1.1.jar) val normalfill = sc.textFile(fileName).map(line = line.split('|')) normalfill.map(line = (line(0), line(1), line(2), line(3), line(4), line(5), line(6), line(7), line(8), line(9), line(10), line(11), line(12), line(13), line(14), line(15), line(16), line(17), line(18), line(19), line(20), line(21))).saveToCassandra(keyspace, tblName, SomeColumns(wfctotalid, timesheetitemid, employeeid, durationsecsqty, wageamt, moneyamt, applydtm, laboracctid, paycodeid, startdtm, stimezoneid, adjstartdtm, adjapplydtm, enddtm, homeaccountsw, notpaidsw, wfcjoborgid, unapprovedsw, durationdaysqty, updatedtm, totaledversion, acctapprovalnum)) println(Records Loaded to .format(tblName)) Thread.sleep(500) sc.stop() } } Below is the sbt file: name:= “POC” version := 0.0.1 scalaVersion := 2.10.4 // additional libraries libraryDependencies ++= Seq( org.apache.spark %% spark-core % 1.1.1 % provided, org.apache.spark %% spark-sql % 1.1.1 % provided, com.datastax.spark %% spark-cassandra-connector % 1.1.1 % provided ) Regards, Tarun Tiwari | Workforce Analytics-ETL | Kronos India M: +91 9540 28 27 77 | Tel: +91 120 4015200 Kronos | Time Attendance • Scheduling • Absence Management • HR Payroll • Hiring • Labor Analytics Join Kronos on: kronos.comhttp://www.kronos.com/ | Facebookhttp://www.kronos.com/facebook|Twitterhttp://www.kronos.com/twitter|LinkedInhttp://www.kronos.com/linkedin |YouTubehttp://www.kronos.com/youtube
Getting NoClassDefFoundError for com/datastax/spark/connector/mapper/ColumnMapper
Hi Experts, I am getting java.lang.NoClassDefFoundError: com/datastax/spark/connector/mapper/ColumnMapper while running a app to load data to Cassandra table using the datastax spark connector Is there something else I need to import in the program or dependencies? RUNTIME ERROR: Exception in thread main java.lang.NoClassDefFoundError: com/datastax/spark/connector/mapper/ColumnMapper at ldCassandraTable.main(ld_Cassandra_tbl_Job.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Below is my scala program /*** ld_Cassandra_Table.scala ***/ import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf import com.datastax.spark.connector import com.datastax.spark.connector._ object ldCassandraTable { def main(args: Array[String]) { val fileName = args(0) val tblName = args(1) val conf = new SparkConf(true).set(spark.cassandra.connection.host, MASTER HOST) .setMaster(MASTER URL) .setAppName(LoadCassandraTableApp) val sc = new SparkContext(conf) sc.addJar(/home/analytics/Installers/spark-cassandra-connector-1.1.1/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.1.1.jar) val normalfill = sc.textFile(fileName).map(line = line.split('|')) normalfill.map(line = (line(0), line(1), line(2), line(3), line(4), line(5), line(6), line(7), line(8), line(9), line(10), line(11), line(12), line(13), line(14), line(15), line(16), line(17), line(18), line(19), line(20), line(21))).saveToCassandra(keyspace, tblName, SomeColumns(wfctotalid, timesheetitemid, employeeid, durationsecsqty, wageamt, moneyamt, applydtm, laboracctid, paycodeid, startdtm, stimezoneid, adjstartdtm, adjapplydtm, enddtm, homeaccountsw, notpaidsw, wfcjoborgid, unapprovedsw, durationdaysqty, updatedtm, totaledversion, acctapprovalnum)) println(Records Loaded to .format(tblName)) Thread.sleep(500) sc.stop() } } Below is the sbt file: name:= POC version := 0.0.1 scalaVersion := 2.10.4 // additional libraries libraryDependencies ++= Seq( org.apache.spark %% spark-core % 1.1.1 % provided, org.apache.spark %% spark-sql % 1.1.1 % provided, com.datastax.spark %% spark-cassandra-connector % 1.1.1 % provided ) Regards, Tarun Tiwari | Workforce Analytics-ETL | Kronos India M: +91 9540 28 27 77 | Tel: +91 120 4015200 Kronos | Time Attendance * Scheduling * Absence Management * HR Payroll * Hiring * Labor Analytics Join Kronos on: kronos.comhttp://www.kronos.com/ | Facebookhttp://www.kronos.com/facebook | Twitterhttp://www.kronos.com/twitter | LinkedInhttp://www.kronos.com/linkedin | YouTubehttp://www.kronos.com/youtube
Way to Cassandra File System
Hi All, In DSE they claim to have Cassandra File System in place of Hadoop which makes it real fault tolerant. Is there a way to use Cassandra file system CFS in place of HDFS if I don't have DSE? Regards, Tarun Tiwari | Workforce Analytics-ETL | Kronos India M: +91 9540 28 27 77 | Tel: +91 120 4015200 Kronos | Time Attendance * Scheduling * Absence Management * HR Payroll * Hiring * Labor Analytics Join Kronos on: kronos.comhttp://www.kronos.com/ | Facebookhttp://www.kronos.com/facebook | Twitterhttp://www.kronos.com/twitter | LinkedInhttp://www.kronos.com/linkedin | YouTubehttp://www.kronos.com/youtube
RE: Way to Cassandra File System
Cool I think that helps. Regards Tarun From: Jonathan Lacefield [mailto:jlacefi...@datastax.com] Sent: Tuesday, March 24, 2015 6:39 PM To: user@cassandra.apache.org Subject: Re: Way to Cassandra File System Hello, CFS is a DataStax proprietary implementation of the Hadoop File System interface/abstract base class, sorry don't remember which of the top of my head. You could create your own implementation if you do not want to use DataStax's CFS. Or you could purchase DataStax Enterprise. Hope this provides clarity for you. Thanks, Jonathan [datastax_logo.png] Jonathan Lacefield Director - Consulting, Americas | (404) 822 3487 | jlacefi...@datastax.commailto:jlacefi...@datastax.com mailto:jlacefi...@datastax.com [http://datastax.com/all/images/cs_logo_color_sm.png]https://github.com/datastax/ [linkedin.png]http://www.linkedin.com/in/jlacefield/[facebook.png]https://www.facebook.com/datastax[twitter.png]https://twitter.com/datastax[g+.png]https://plus.google.com/+Datastax/about[https://lh4.googleusercontent.com/g_E-Kk1mFB6KumrFJDImPZcbUFQhgDob3EYJBbLsr7dtCXwL5yreGB1qF3Q0ZldLhYYU6U70dO4rh3qhP5fCWPtD892G_VvU0DjK3qeG-QpjXeFO3Q7e77xsaxc0TPbwQA]http://feeds.feedburner.com/datastax[https://lh3.googleusercontent.com/XEb8siCDthQd9pPzGM62gd-KwmrCQkNuhLqToqta8XqIhJABtU8doRL7UQy0YyliroXaqY6P95aMZpQCTBI2CjIjw5tvGBhAMsb68LRMOWbYlEn_kCjS459wU4aYbUoZEw]https://github.com/datastax/ On Tue, Mar 24, 2015 at 9:01 AM, Tiwari, Tarun tarun.tiw...@kronos.commailto:tarun.tiw...@kronos.com wrote: Hi All, In DSE they claim to have Cassandra File System in place of Hadoop which makes it real fault tolerant. Is there a way to use Cassandra file system CFS in place of HDFS if I don’t have DSE? Regards, Tarun Tiwari | Workforce Analytics-ETL | Kronos India M: +91 9540 28 27 77 | Tel: +91 120 4015200tel:%2B91%20120%204015200 Kronos | Time Attendance • Scheduling • Absence Management • HR Payroll • Hiring • Labor Analytics Join Kronos on: kronos.comhttp://www.kronos.com/ | Facebookhttp://www.kronos.com/facebook | Twitterhttp://www.kronos.com/twitter | LinkedInhttp://www.kronos.com/linkedin | YouTubehttp://www.kronos.com/youtube