Got exception running Sqoop: org.apache.cassandra.db.marshal.MarshalException: 97 is not recognized as a valid type, while importing data from mysql to cassandra using sqoop
Hi Everyone, I am trying to import data into cassandra column family from mysql and i am getting the following error. ERROR sqoop.Sqoop: Got exception running Sqoop: org.apache.cassandra.db.marshal.MarshalException: 97 is not recognized as a valid type org.apache.cassandra.db.marshal.MarshalException: 97 is not recognized as a valid type at com.datastax.bdp.util.CompositeUtil.deserialize(CompositeUtil.java:93) at com.datastax.bdp.hadoop.cfs.CassandraFileSystemThriftStore.retrieveINode(CassandraFileSystemThriftStore.java:585) at com.datastax.bdp.hadoop.cfs.CassandraFileSystemThriftStore.retrieveINode(CassandraFileSystemThriftStore.java:563) at com.datastax.bdp.hadoop.cfs.CassandraFileSystem.getFileStatus(CassandraFileSystem.java:520) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:768) at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:103) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850) at org.apache.hadoop.mapreduce.Job.submit(Job.java:500) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530) at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:119) at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:179) at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:423) at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:97) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:380) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453) at org.apache.sqoop.Sqoop.run(Sqoop.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229) at org.apache.sqoop.Sqoop.main(Sqoop.java:238) at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57) Could anyone help me solve this issue. Thanks Rajesh Kumar
Re: DROP keyspace doesn't delete the files
bump On Jul 29, 2012, at 11:03 PM, Drew Kutcharian wrote: > Hi, > > What's the correct procedure to drop a keyspace? When I drop a keyspace, the > files of that keyspace don't get deleted. There is a JIRA on this: > > https://issues.apache.org/jira/browse/CASSANDRA-4075 > > Is this a bug or I'm missing something? > > I'm using Cassandra 1.1.2 on Ubuntu Linux with Sun JVM 1.6, 64bit > > Thanks, > > Drew >
Re: Mixed cluster node with version 1.1.2 and 1.0.6 gives errors
Thanks to point me the solution. So that means, I want to upgrade 1.0.6 cluster to 1.0.11 first, then upgrade to 1.1.2 version. Is I am right? Thanks /Roshan -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Mixed-cluster-node-with-version-1-1-2-and-1-0-6-gives-errors-tp7581534p7581573.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Mixed cluster node with version 1.1.2 and 1.0.6 gives errors
Hey, It is explained here: https://issues.apache.org/jira/browse/CASSANDRA-4195 -- Omid On Wed, Aug 1, 2012 at 2:39 AM, Roshan wrote: > Hi > > I have 3 node development cluster and all running 1.0.6 version without any > issue. As the part of the upgrade to 1.1.2, I just upgrade one node to > 1.1.2 > version. When start the upgraded 1.1.2 node, the other 1.0.6 nodes getting > the below exceptions? > > 2012-08-01 18:31:15,990 INFO [IncomingTcpConnection] Received connection > from newer protocol version. Ignorning > 2012-08-01 18:31:16,008 INFO [Gossiper] Node /10.1.161.202 has restarted, > now UP > 2012-08-01 18:31:16,008 INFO [Gossiper] InetAddress /10.1.161.202 is now > UP > 2012-08-01 18:31:16,010 ERROR [AbstractCassandraDaemon] Fatal exception in > thread Thread[GossipStage:1,5,main] > java.lang.UnsupportedOperationException: Not a time-based UUID > at java.util.UUID.timestamp(UUID.java:308) > at > > org.apache.cassandra.service.MigrationManager.rectify(MigrationManager.java:98) > at > > org.apache.cassandra.service.MigrationManager.onAlive(MigrationManager.java:81) > at org.apache.cassandra.gms.Gossiper.markAlive(Gossiper.java:807) > at > org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:850) > at > org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:909) > at > > org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68) > at > > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > 2012-08-01 18:31:16,013 ERROR [AbstractCassandraDaemon] Fatal exception in > thread Thread[GossipStage:1,5,main] > java.lang.UnsupportedOperationException: Not a time-based UUID > at java.util.UUID.timestamp(UUID.java:308) > at > > org.apache.cassandra.service.MigrationManager.rectify(MigrationManager.java:98) > at > > org.apache.cassandra.service.MigrationManager.onAlive(MigrationManager.java:81) > at org.apache.cassandra.gms.Gossiper.markAlive(Gossiper.java:807) > at > org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:850) > at > org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:909) > at > > org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68) > at > > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > 2012-08-01 18:31:16,383 INFO [StorageService] Node /10.1.161.202 state > jump > to normal > 2012-08-01 18:32:17,132 ERROR [AbstractCassandraDaemon] Fatal exception in > thread Thread[HintedHandoff:1,1,main] > java.lang.RuntimeException: Could not reach schema agreement with > /10.1.161.202 in 6ms > at > > org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:224) > at > > org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:239) > at > > org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:81) > at > > org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:353) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > 2012-08-01 18:32:17,133 ERROR [AbstractCassandraDaemon] Fatal exception in > thread Thread[HintedHandoff:1,1,main] > java.lang.RuntimeException: Could not reach schema agreement with > /10.1.161.202 in 6ms > at > > org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:224) > at > > org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:239) > at > > org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:81) > at > > org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:353) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at
Joining DR nodes in new data center
What is the process for joining a new data center to an existing cluster as DR? We have a 5 node cluster in our primary DC, and want to bring up 5 more in our 2nd data center purely for DR. How should these new nodes be joined to the cluster and be seen as the 2nd data center? Do the new nodes mirror the configuration of the existing nodes but with some setting to indicate they are in another DC? Our existing cluster is using the defaults mostly of network placement strategy and simple snitch. Thanks.
Re: composite table with cassandra without using cql3?
For how to do it with astyanax, you can see here... Lines 310 and 335 https://github.com/deanhiller/nosqlORM/blob/indexing/input/javasrc/com/alva zan/orm/layer3/spi/db/cassandra/CassandraSession.java For how to do with thrift, you could look at astyanax. I use it on that project for indexing for the ORM layer we use(which is not listed on the cassandra ORM's page as of yet ;) ). Later, Dean On 8/2/12 9:50 AM, "Greg Fausak" wrote: >I've been using the cql3 to create a composite table. >Can I use the thrift interface to accomplish the >same thing? In other words, do I have to use cql 3 to >get a composite table type? (The same behavior as >multiple PRIMARY key columns). > >Thanks, >---greg
composite table with cassandra without using cql3?
I've been using the cql3 to create a composite table. Can I use the thrift interface to accomplish the same thing? In other words, do I have to use cql 3 to get a composite table type? (The same behavior as multiple PRIMARY key columns). Thanks, ---greg
Re: Is large number of columns per row a problem?
Hi, On Thursday, 2 August 2012 at 11:47, Owen Davies wrote: > We want to store a large number of columns in a single row (up to about > 100,000,000), where each value is roughly 10 bytes. > > We also need to be able to get slices of columns from any point in the row. > > We haven't found a problem with smaller amounts of data so far, but can > anyone think of any reason if this is a bad idea, or would cause large > performance problems? my experience with wide rows & cassandra is not positive. We used to have rows of a few hundred megabytes each, to be read during Map Reduce computation, and that caused many issues, especially with timeouts reading the rows (with cassandra under a medium write load) and OutOfMemory exceptions. The solution in our case was to "shard" (timebucket) the rows into smaller pieces (a few megabytes each). The situation might have changed with Cassandra 1.1.0, which claims to have some "wide row" support, but I haven't been able to test that. > > If breaking up the row is something we should do, what is the maximum number > of columns we should have? > > We are not too worried if there is only a small performance decrease, adding > more nodes to the cluster would be an option to help make code simpler. I don't have a precise figure, but I'd limit row size to less than 100MB… much less, if possible. In general, my experience is that hundred of millions of small rows don't cause issues, but having just a few very wide rows will cause timeouts and, in worst cases, OOM. -- Filippo Diotalevi
Re: RE Restore snapshot
Then http://www.datastax.com/docs/1.1/operations/backup_restore should mention it :-) Sylvain Lebresne a écrit sur 02/08/2012 11:45:46 : > Actually that's wrong, it is perfectly possible to restore a snapshot > on a live cassandra cluster. > There is even basically 2 solutions: > 1) use the sstableloader (http://www.datastax.com/dev/blog/bulk-loading) > 2) copy the snapshot sstable in the right place and call the JMX > method loadNewSSTables() (in the column family MBean, which mean you > need to do that per-CF). > > -- > Sylvain
Is large number of columns per row a problem?
We want to store a large number of columns in a single row (up to about 100,000,000), where each value is roughly 10 bytes. We also need to be able to get slices of columns from any point in the row. We haven't found a problem with smaller amounts of data so far, but can anyone think of any reason if this is a bad idea, or would cause large performance problems? If breaking up the row is something we should do, what is the maximum number of columns we should have? We are not too worried if there is only a small performance decrease, adding more nodes to the cluster would be an option to help make code simpler. Thanks, Owen Davies
Re: RE Restore snapshot
1) I assume that I have to call the loadNewSSTables() on each node? this is same as "nodetool refresh?"
RE: RE Restore snapshot
Great! I will use the hardlinks to 'restore' the data files on each node (super fast)! I have some related questions : 1) I assume that I have to call the loadNewSSTables() on each node? 2) To be on the save side, I guess I better drop the existing keyspace and then recreate using the definition at the time of the snapshot. But is it allowed the copy the 'old' data files after that with respect to new internal ids versus ids maintained (if any) in the data files? 3) A quick look at the code (took 1.0.5), is it possible that the Table.open is also calling the initCaches on the CFs, but the loadNewSSTables is not? 4) As a solution to 3) : I'm working with embedded Cassandra servers, so I think it would be possible for me to do the following *Drop KS x if present *Create KS x from old definition *On each node : *** Table.clear(x) *** Delete any remaining files in the directory x *** Restore data files from snapshot for KS x *** Table.open(x); -Original Message- From: Sylvain Lebresne [mailto:sylv...@datastax.com] Sent: donderdag 2 augustus 2012 11:46 To: user@cassandra.apache.org Subject: Re: RE Restore snapshot Actually that's wrong, it is perfectly possible to restore a snapshot on a live cassandra cluster. There is even basically 2 solutions: 1) use the sstableloader (http://www.datastax.com/dev/blog/bulk-loading) 2) copy the snapshot sstable in the right place and call the JMX method loadNewSSTables() (in the column family MBean, which mean you need to do that per-CF). -- Sylvain On Thu, Aug 2, 2012 at 9:16 AM, Romain HARDOUIN wrote: > > No it's not possible > > "Desimpel, Ignace" a écrit sur 01/08/2012 > 14:58:49 : > >> Hi, >> >> Is it possible to restore a snapshot of a keyspace on a live >> cassandra cluster (I mean without restarting)? >>
Re: RE Restore snapshot
Actually that's wrong, it is perfectly possible to restore a snapshot on a live cassandra cluster. There is even basically 2 solutions: 1) use the sstableloader (http://www.datastax.com/dev/blog/bulk-loading) 2) copy the snapshot sstable in the right place and call the JMX method loadNewSSTables() (in the column family MBean, which mean you need to do that per-CF). -- Sylvain On Thu, Aug 2, 2012 at 9:16 AM, Romain HARDOUIN wrote: > > No it's not possible > > "Desimpel, Ignace" a écrit sur 01/08/2012 > 14:58:49 : > >> Hi, >> >> Is it possible to restore a snapshot of a keyspace on a live >> cassandra cluster (I mean without restarting)? >>
RE Restore snapshot
No it's not possible "Desimpel, Ignace" a écrit sur 01/08/2012 14:58:49 : > Hi, > > Is it possible to restore a snapshot of a keyspace on a live > cassandra cluster (I mean without restarting)? >
Cassandra startup failed due to InstanceAlreadyExistsException of some indexes
Hi,I have 2 nodes with RF of 2. I've added a secondary index (starttimeindex) recently to one of my column families (alerts) and executed the scrub command, but after restarting both of my nodes I got InstanceAlreadyExistsException for that index column family. it seems that cassandra made the index twice or even more (I updated the column family to add index several times actually) what should I do to fix it? I have seen this thread: http://mail-archives.apache.org/mod_mbox/cassandra-user/201109.mbox/%3ccaldd-zhuflt3urdsk0ahsmqdj-n1keyxonn4rgzjjz13cag...@mail.gmail.com%3Ebut it doesn't help me because both of my nodes have this problem so none of them start up. INFO 11:15:13,647 Creating new index : ColumnDefinition{name=6964, validator=org.apache.cassandra.db.marshal.LongType, index_type=KEYS, index_name='compressedidindex'} INFO 11:15:13,657 Creating new index : ColumnDefinition{name=7374617274746f696d65, validator=org.apache.cassandra.db.marshal.LongType, index_type=KEYS, index_name='starttimeindex'} INFO 11:15:13,662 Creating new index : ColumnDefinition{name=737461727474696d6532, validator=org.apache.cassandra.db.marshal.LongType, index_type=KEYS, index_name='starttimeindex'} INFO 11:15:13,662 Submitting index build of alerts.starttimeindex for data in SSTableReader(path='/media/data/logcorrelation/alerts/logcorrelation-alerts-hd-2099-Data.db'), SSTableReader(path='/media/data/logcorrelation/alerts/logcorrelation-alerts-hd-2096-Data.db'), SSTableReader(path='/media/data/logcorrelation/alerts/logcorrelation-alerts-hd-2098-Data.db'), SSTableReader(path='/media/data/logcorrelation/alerts/logcorrelation-alerts-hd-2101-Data.db'), SSTableReader(path='/media/data/logcorrelation/alerts/logcorrelation-alerts-hd-2100-Data.db'), SSTableReader(path='/media/data/logcorrelation/alerts/logcorrelation-alerts-hd-2097-Data.db') ERROR 11:15:13,664 Exception encountered during startup java.lang.RuntimeException: javax.management.InstanceAlreadyExistsException: org.apache.cassandra.db:type=IndexColumnFamilies,keyspace=logcorrelation,columnfamily=alerts.starttimeindex at org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:261) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:341) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:318) at org.apache.cassandra.db.index.keys.KeysIndex.init(KeysIndex.java:60) at org.apache.cassandra.db.index.SecondaryIndexManager.addIndexedColumn(SecondaryIndexManager.java:238) at org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:247) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:341) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:313) at org.apache.cassandra.db.Table.initCf(Table.java:371) at org.apache.cassandra.db.Table.(Table.java:304) at org.apache.cassandra.db.Table.open(Table.java:119) at org.apache.cassandra.db.Table.open(Table.java:97) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:204) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:353) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:106) Caused by: javax.management.InstanceAlreadyExistsException: org.apache.cassandra.db:type=IndexColumnFamilies,keyspace=logcorrelation,columnfamily=alerts.starttimeindex at com.sun.jmx.mbeanserver.Repository.addMBean(Unknown Source) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.internal_addObject(Unknown Source) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(Unknown Source) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(Unknown Source) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(Unknown Source) at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(Unknown Source) at org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:257) ... 14 more java.lang.RuntimeException: javax.management.InstanceAlreadyExistsException: org.apache.cassandra.db:type=IndexColumnFamilies,keyspace=logcorrelation,columnfamily=alerts.starttimeindex INFO 11:15:13,665 reading saved cache /media/data/saved_caches/logcorrelation-alerts-KeyCache at org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:261) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:341) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:318) at org.apache.cassandra.db.index.keys.KeysIndex.init(KeysIndex.java:60) at org.apache.cassandra.db.index.SecondaryIndexManager.addIndexedColumn(SecondaryIndexManager.java:238) at org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:247) at org.apache.cassandra.db.ColumnFamilyS