Got exception running Sqoop: org.apache.cassandra.db.marshal.MarshalException: 97 is not recognized as a valid type, while importing data from mysql to cassandra using sqoop

2012-08-02 Thread rajesh.ba...@orkash.com

Hi Everyone,
I am trying to import data into cassandra column family from mysql and i 
am getting the following error.


ERROR sqoop.Sqoop: Got exception running Sqoop: 
org.apache.cassandra.db.marshal.MarshalException: 97 is not recognized 
as a valid type
org.apache.cassandra.db.marshal.MarshalException: 97 is not recognized 
as a valid type
at 
com.datastax.bdp.util.CompositeUtil.deserialize(CompositeUtil.java:93)
at 
com.datastax.bdp.hadoop.cfs.CassandraFileSystemThriftStore.retrieveINode(CassandraFileSystemThriftStore.java:585)
at 
com.datastax.bdp.hadoop.cfs.CassandraFileSystemThriftStore.retrieveINode(CassandraFileSystemThriftStore.java:563)
at 
com.datastax.bdp.hadoop.cfs.CassandraFileSystem.getFileStatus(CassandraFileSystem.java:520)

at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:768)
at 
org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:103)

at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)

at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
at 
org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:119)
at 
org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:179)
at 
org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:423)
at 
org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:97)

at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:380)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)

Could anyone help me solve this issue.

Thanks
Rajesh Kumar



Re: DROP keyspace doesn't delete the files

2012-08-02 Thread Drew Kutcharian
bump

On Jul 29, 2012, at 11:03 PM, Drew Kutcharian wrote:

> Hi,
> 
> What's the correct procedure to drop a keyspace? When I drop a keyspace, the 
> files of that keyspace don't get deleted. There is a JIRA on this:
> 
> https://issues.apache.org/jira/browse/CASSANDRA-4075
> 
> Is this a bug or I'm missing something?
> 
> I'm using Cassandra 1.1.2 on Ubuntu Linux with Sun JVM 1.6, 64bit
> 
> Thanks,
> 
> Drew
> 



Re: Mixed cluster node with version 1.1.2 and 1.0.6 gives errors

2012-08-02 Thread Roshan
Thanks to point me the solution. So that means, I want to upgrade 1.0.6
cluster to 1.0.11 first, then upgrade to 1.1.2 version. Is I am right?

Thanks

/Roshan



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Mixed-cluster-node-with-version-1-1-2-and-1-0-6-gives-errors-tp7581534p7581573.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Mixed cluster node with version 1.1.2 and 1.0.6 gives errors

2012-08-02 Thread Omid Aladini
Hey,

It is explained here:

https://issues.apache.org/jira/browse/CASSANDRA-4195

-- Omid

On Wed, Aug 1, 2012 at 2:39 AM, Roshan  wrote:

> Hi
>
> I have 3 node development cluster and all running 1.0.6 version without any
> issue. As the part of the upgrade to 1.1.2, I just upgrade one node to
> 1.1.2
> version. When start the upgraded 1.1.2 node, the other 1.0.6 nodes getting
> the below exceptions?
>
> 2012-08-01 18:31:15,990 INFO  [IncomingTcpConnection] Received connection
> from newer protocol version. Ignorning
> 2012-08-01 18:31:16,008 INFO  [Gossiper] Node /10.1.161.202 has restarted,
> now UP
> 2012-08-01 18:31:16,008 INFO  [Gossiper] InetAddress /10.1.161.202 is now
> UP
> 2012-08-01 18:31:16,010 ERROR [AbstractCassandraDaemon] Fatal exception in
> thread Thread[GossipStage:1,5,main]
> java.lang.UnsupportedOperationException: Not a time-based UUID
> at java.util.UUID.timestamp(UUID.java:308)
> at
>
> org.apache.cassandra.service.MigrationManager.rectify(MigrationManager.java:98)
> at
>
> org.apache.cassandra.service.MigrationManager.onAlive(MigrationManager.java:81)
> at org.apache.cassandra.gms.Gossiper.markAlive(Gossiper.java:807)
> at
> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:850)
> at
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:909)
> at
>
> org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68)
> at
>
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> 2012-08-01 18:31:16,013 ERROR [AbstractCassandraDaemon] Fatal exception in
> thread Thread[GossipStage:1,5,main]
> java.lang.UnsupportedOperationException: Not a time-based UUID
> at java.util.UUID.timestamp(UUID.java:308)
> at
>
> org.apache.cassandra.service.MigrationManager.rectify(MigrationManager.java:98)
> at
>
> org.apache.cassandra.service.MigrationManager.onAlive(MigrationManager.java:81)
> at org.apache.cassandra.gms.Gossiper.markAlive(Gossiper.java:807)
> at
> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:850)
> at
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:909)
> at
>
> org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68)
> at
>
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> 2012-08-01 18:31:16,383 INFO  [StorageService] Node /10.1.161.202 state
> jump
> to normal
> 2012-08-01 18:32:17,132 ERROR [AbstractCassandraDaemon] Fatal exception in
> thread Thread[HintedHandoff:1,1,main]
> java.lang.RuntimeException: Could not reach schema agreement with
> /10.1.161.202 in 6ms
> at
>
> org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:224)
> at
>
> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:239)
> at
>
> org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:81)
> at
>
> org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:353)
> at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> 2012-08-01 18:32:17,133 ERROR [AbstractCassandraDaemon] Fatal exception in
> thread Thread[HintedHandoff:1,1,main]
> java.lang.RuntimeException: Could not reach schema agreement with
> /10.1.161.202 in 6ms
> at
>
> org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:224)
> at
>
> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:239)
> at
>
> org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:81)
> at
>
> org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:353)
> at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at

Joining DR nodes in new data center

2012-08-02 Thread Bryce Godfrey
What is the process for joining a new data center to an existing cluster as DR?

We have a 5 node cluster in our primary DC, and want to bring up 5 more in our 
2nd data center purely for DR.  How should these new nodes be joined to the 
cluster and be seen as the 2nd data center?  Do the new nodes mirror the 
configuration of the existing nodes but with some setting to indicate they are 
in another DC?

Our existing cluster is using the defaults mostly of network placement strategy 
and simple snitch.

Thanks.


Re: composite table with cassandra without using cql3?

2012-08-02 Thread Hiller, Dean
For how to do it with astyanax, you can see here...

Lines 310 and 335

https://github.com/deanhiller/nosqlORM/blob/indexing/input/javasrc/com/alva
zan/orm/layer3/spi/db/cassandra/CassandraSession.java


For how to do with thrift, you could look at astyanax.

I use it on that project for indexing for the ORM layer we use(which is
not listed on the cassandra ORM's page as of yet ;) ).

Later,
Dean


On 8/2/12 9:50 AM, "Greg Fausak"  wrote:

>I've been using the cql3 to create a composite table.
>Can I use the thrift interface to accomplish the
>same thing?  In other words, do I have to use cql 3 to
>get a composite table type? (The same behavior as
>multiple PRIMARY key columns).
>
>Thanks,
>---greg



composite table with cassandra without using cql3?

2012-08-02 Thread Greg Fausak
I've been using the cql3 to create a composite table.
Can I use the thrift interface to accomplish the
same thing?  In other words, do I have to use cql 3 to
get a composite table type? (The same behavior as
multiple PRIMARY key columns).

Thanks,
---greg


Re: Is large number of columns per row a problem?

2012-08-02 Thread Filippo Diotalevi
Hi,

On Thursday, 2 August 2012 at 11:47, Owen Davies wrote:

> We want to store a large number of columns in a single row (up to about 
> 100,000,000), where each value is roughly 10 bytes.
>  
> We also need to be able to get slices of columns from any point in the row.
>  
> We haven't found a problem with smaller amounts of data so far, but can 
> anyone think of any reason if this is a bad idea, or would cause large 
> performance problems?

my experience with wide rows & cassandra is not positive. We used to have rows 
of a few hundred megabytes each, to be read during Map Reduce computation, and 
that caused many issues, especially with timeouts reading the rows (with 
cassandra under a medium write load) and OutOfMemory exceptions.

The solution in our case was to "shard" (timebucket) the rows into smaller 
pieces (a few megabytes each).

The situation might have changed with Cassandra 1.1.0, which claims to have 
some "wide row" support, but I haven't been able to test that.

>  
> If breaking up the row is something we should do, what is the maximum number 
> of columns we should have?
>  
> We are not too worried if there is only a small performance decrease, adding 
> more nodes to the cluster would be an option to help make code simpler.

I don't have a precise figure, but I'd limit row size to less than 100MB… much 
less, if possible. In general, my experience is that hundred of millions of 
small rows don't cause issues, but having just a few very wide rows will cause 
timeouts and, in worst cases, OOM.


--  
Filippo Diotalevi



Re: RE Restore snapshot

2012-08-02 Thread Romain HARDOUIN
Then http://www.datastax.com/docs/1.1/operations/backup_restore should 
mention it  :-)

Sylvain Lebresne  a écrit sur 02/08/2012 11:45:46 :

> Actually that's wrong, it is perfectly possible to restore a snapshot
> on a live cassandra cluster.
> There is even basically 2 solutions:
> 1) use the sstableloader (http://www.datastax.com/dev/blog/bulk-loading)
> 2) copy the snapshot sstable in the right place and call the JMX
> method loadNewSSTables() (in the column family MBean, which mean you
> need to do that per-CF).
> 
> --
> Sylvain


Is large number of columns per row a problem?

2012-08-02 Thread Owen Davies
We want to store a large number of columns in a single row (up to about 
100,000,000), where each value is roughly 10 bytes.

We also need to be able to get slices of columns from any point in the row.

We haven't found a problem with smaller amounts of data so far, but can anyone 
think of any reason if this is a bad idea, or would cause large performance 
problems?

If breaking up the row is something we should do, what is the maximum number of 
columns we should have?

We are not too worried if there is only a small performance decrease, adding 
more nodes to the cluster would be an option to help make code simpler.

Thanks,

Owen Davies

Re: RE Restore snapshot

2012-08-02 Thread Radim Kolar



1) I assume that I have to call the loadNewSSTables() on each node?

this is same as "nodetool refresh?"


RE: RE Restore snapshot

2012-08-02 Thread Desimpel, Ignace
Great! I will use the hardlinks to 'restore' the data files on each node (super 
fast)!
I have some related questions :

1) I assume that I have to call the loadNewSSTables() on each node?

2) To be on the save side, I guess I better drop the existing keyspace and then 
recreate using the definition at the time of the snapshot. But is it allowed 
the copy the 'old' data files after that with respect to new internal ids 
versus ids maintained (if any) in the data files?

3) A quick look at the code (took 1.0.5), is it possible that the Table.open is 
also calling the initCaches on the CFs, but the loadNewSSTables is not?

4) As a solution to 3) : I'm working with embedded Cassandra servers, so I 
think it would be possible for me to do the following
  *Drop KS x if present
  *Create KS x from old definition
  *On each node :
  *** Table.clear(x)
  *** Delete any remaining files in the directory x
  *** Restore data files from snapshot for KS x
  *** Table.open(x);

-Original Message-
From: Sylvain Lebresne [mailto:sylv...@datastax.com] 
Sent: donderdag 2 augustus 2012 11:46
To: user@cassandra.apache.org
Subject: Re: RE Restore snapshot

Actually that's wrong, it is perfectly possible to restore a snapshot on a live 
cassandra cluster.
There is even basically 2 solutions:
1) use the sstableloader (http://www.datastax.com/dev/blog/bulk-loading)
2) copy the snapshot sstable in the right place and call the JMX method 
loadNewSSTables() (in the column family MBean, which mean you need to do that 
per-CF).

--
Sylvain

On Thu, Aug 2, 2012 at 9:16 AM, Romain HARDOUIN  
wrote:
>
> No it's not possible
>
> "Desimpel, Ignace"  a écrit sur 01/08/2012
> 14:58:49 :
>
>> Hi,
>>
>> Is it possible to restore a snapshot of a keyspace on a live 
>> cassandra cluster (I mean without restarting)?
>>


Re: RE Restore snapshot

2012-08-02 Thread Sylvain Lebresne
Actually that's wrong, it is perfectly possible to restore a snapshot
on a live cassandra cluster.
There is even basically 2 solutions:
1) use the sstableloader (http://www.datastax.com/dev/blog/bulk-loading)
2) copy the snapshot sstable in the right place and call the JMX
method loadNewSSTables() (in the column family MBean, which mean you
need to do that per-CF).

--
Sylvain

On Thu, Aug 2, 2012 at 9:16 AM, Romain HARDOUIN
 wrote:
>
> No it's not possible
>
> "Desimpel, Ignace"  a écrit sur 01/08/2012
> 14:58:49 :
>
>> Hi,
>>
>> Is it possible to restore a snapshot of a keyspace on a live
>> cassandra cluster (I mean without restarting)?
>>


RE Restore snapshot

2012-08-02 Thread Romain HARDOUIN
No it's not possible

"Desimpel, Ignace"  a écrit sur 01/08/2012 
14:58:49 :

> Hi,
> 
> Is it possible to restore a snapshot of a keyspace on a live 
> cassandra cluster (I mean without restarting)? 
> 

Cassandra startup failed due to InstanceAlreadyExistsException of some indexes

2012-08-02 Thread Kam Nob
Hi,I have 2 nodes with RF of 2. I've added a secondary index
(starttimeindex) recently to one of my column families (alerts) and
executed the scrub command, but after restarting both of my nodes I got
InstanceAlreadyExistsException for that index column family. it seems that
cassandra made the index twice or even more (I updated the column family to
add index several times actually) what should I do to fix it? I have seen
this thread:
http://mail-archives.apache.org/mod_mbox/cassandra-user/201109.mbox/%3ccaldd-zhuflt3urdsk0ahsmqdj-n1keyxonn4rgzjjz13cag...@mail.gmail.com%3Ebut
it doesn't help me because both of my nodes have this problem so none
of them start up.

INFO 11:15:13,647 Creating new index : ColumnDefinition{name=6964,
validator=org.apache.cassandra.db.marshal.LongType, index_type=KEYS,
index_name='compressedidindex'}
 INFO 11:15:13,657 Creating new index :
ColumnDefinition{name=7374617274746f696d65,
validator=org.apache.cassandra.db.marshal.LongType, index_type=KEYS,
index_name='starttimeindex'}
 INFO 11:15:13,662 Creating new index :
ColumnDefinition{name=737461727474696d6532,
validator=org.apache.cassandra.db.marshal.LongType, index_type=KEYS,
index_name='starttimeindex'}
 INFO 11:15:13,662 Submitting index build of alerts.starttimeindex for data
in
SSTableReader(path='/media/data/logcorrelation/alerts/logcorrelation-alerts-hd-2099-Data.db'),
SSTableReader(path='/media/data/logcorrelation/alerts/logcorrelation-alerts-hd-2096-Data.db'),
SSTableReader(path='/media/data/logcorrelation/alerts/logcorrelation-alerts-hd-2098-Data.db'),
SSTableReader(path='/media/data/logcorrelation/alerts/logcorrelation-alerts-hd-2101-Data.db'),
SSTableReader(path='/media/data/logcorrelation/alerts/logcorrelation-alerts-hd-2100-Data.db'),
SSTableReader(path='/media/data/logcorrelation/alerts/logcorrelation-alerts-hd-2097-Data.db')
ERROR 11:15:13,664 Exception encountered during startup
java.lang.RuntimeException:
javax.management.InstanceAlreadyExistsException:
org.apache.cassandra.db:type=IndexColumnFamilies,keyspace=logcorrelation,columnfamily=alerts.starttimeindex
at
org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:261)
at
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:341)
at
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:318)
at org.apache.cassandra.db.index.keys.KeysIndex.init(KeysIndex.java:60)
at
org.apache.cassandra.db.index.SecondaryIndexManager.addIndexedColumn(SecondaryIndexManager.java:238)
at
org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:247)
at
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:341)
at
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:313)
at org.apache.cassandra.db.Table.initCf(Table.java:371)
at org.apache.cassandra.db.Table.(Table.java:304)
at org.apache.cassandra.db.Table.open(Table.java:119)
at org.apache.cassandra.db.Table.open(Table.java:97)
at
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:204)
at
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:353)
at
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:106)
Caused by: javax.management.InstanceAlreadyExistsException:
org.apache.cassandra.db:type=IndexColumnFamilies,keyspace=logcorrelation,columnfamily=alerts.starttimeindex
at com.sun.jmx.mbeanserver.Repository.addMBean(Unknown Source)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.internal_addObject(Unknown
Source)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(Unknown
Source)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(Unknown
Source)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(Unknown
Source)
at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(Unknown Source)
at
org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:257)
... 14 more
java.lang.RuntimeException:
javax.management.InstanceAlreadyExistsException:
org.apache.cassandra.db:type=IndexColumnFamilies,keyspace=logcorrelation,columnfamily=alerts.starttimeindex
 INFO 11:15:13,665 reading saved cache
/media/data/saved_caches/logcorrelation-alerts-KeyCache
at
org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:261)
at
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:341)
at
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:318)
at org.apache.cassandra.db.index.keys.KeysIndex.init(KeysIndex.java:60)
at
org.apache.cassandra.db.index.SecondaryIndexManager.addIndexedColumn(SecondaryIndexManager.java:238)
at
org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:247)
at
org.apache.cassandra.db.ColumnFamilyS