RE: Get few rows by composite key.
I'm not sure about Hector code (somebody else can chime in here), but to find the keys you're after you can slice to get the keys from AA:BB to BB:AA. Cheers, Steve From: Michael Cherkasov [mailto:michael.cherka...@gmail.com] Sent: Monday, March 19, 2012 9:30 AM To: user@cassandra.apache.org Subject: Get few rows by composite key. Hello, Assume that we have table like this one: Key:Columns names: AA:AA 1:A 1:B 1:C 2:A 2:C AA:BB 1:C 2:A 2:C AA:CC 2:A 2:C AA:DD 1:A 1:B 1:C BB:AA 1:A 1:B 2:C BB:BB 1:A 1:B 1:C 2:C BB:CC 1:A 2:A 2:C BB:DD 1:A 1:C 2:A 2:C Is there any way to take rows with first key's part equals AA and second more or equal BB? I'm interesting about Hector code.
RE: Replacing supercolumns with composite columns; Getting the equivalent of retrieving a list of supercolumns by name
I don't think I can tell my exact column names in many cases. For example most of our queries are for specific keys, and an unknown range of numbers (like key1, key where number 1). How can I set up my slice in this case to retrieve only the columns that match both criteria? Cheers, Steve From: rajkumar@gmail.com [mailto:rajkumar@gmail.com] On Behalf Of Asil Klin Sent: Wednesday, January 04, 2012 12:21 AM To: user@cassandra.apache.org Subject: Re: Replacing supercolumns with composite columns; Getting the equivalent of retrieving a list of supercolumns by name @Stephan: in that case, you can easily tell the names of all columns you want to retrieve, so you can make a query to retrieve those list of composite columns. @Jeremiah, So where is my best bet ? Should I leave the supercolumns as it is as of now, since I can find a good way to use them incase I replace them with composite columns? On Wed, Jan 4, 2012 at 4:01 AM, Stephen Pope stephen.p...@quest.commailto:stephen.p...@quest.com wrote: The bonus you're talking about here, how do I apply that? For example, my columns are in the form of number.idhttp://number.id such as 4.steve, 4.greg, 5.steve, 5.george. Is there a way to query a slice of numbers with a list of ids? As in, I want all the columns with numbers between 4 and 10 which have ids steve or greg. Cheers, Steve -Original Message- From: Jeremiah Jordan [mailto:jeremiah.jor...@morningstar.commailto:jeremiah.jor...@morningstar.com] Sent: Tuesday, January 03, 2012 3:12 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Cc: Asil Klin Subject: Re: Replacing supercolumns with composite columns; Getting the equivalent of retrieving a list of supercolumns by name The main issue with replacing super columns with composite columns right now is that if you don't know all your sub-column names you can't select multiple super columns worth of data in the same query without getting extra stuff. You have to use a slice to get all subcolumns of a given super column, and you can't have disjoint slices, so if you want two super columns full, you have to get all the other stuff that is in between them, or make two queries. If you know what all of the sub-column names are you can ask for all of the super/sub column pairs for all of the super columns you want and not get extra data. If you don't need to pull multiple super columns at a time with slices like that, then there isn't really an issue. A bonus of using composite keys like this, is that if there is a specific sub column you want from multiple super columns, you can pull all those out with a single multiget and you don't have to pull the rest of the columns... So there are pros and cons... -Jeremiah On 01/03/2012 01:58 PM, Asil Klin wrote: I have a super columns family which I always use to retrieve a list of supercolumns(with all subcolumns) by name. I am looking forward to replace all SuperColumns in my schema with the composite columns. How could I design schema so that I could do the equivalent of retrieving a list of supercolumns by name, in case of using composite columns. (As of now I thought of using the supercolumn name as the first component of the composite name and the subcolumn name as 2nd component of composite name.)
RE: Replacing supercolumns with composite columns; Getting the equivalent of retrieving a list of supercolumns by name
The bonus you're talking about here, how do I apply that? For example, my columns are in the form of number.id such as 4.steve, 4.greg, 5.steve, 5.george. Is there a way to query a slice of numbers with a list of ids? As in, I want all the columns with numbers between 4 and 10 which have ids steve or greg. Cheers, Steve -Original Message- From: Jeremiah Jordan [mailto:jeremiah.jor...@morningstar.com] Sent: Tuesday, January 03, 2012 3:12 PM To: user@cassandra.apache.org Cc: Asil Klin Subject: Re: Replacing supercolumns with composite columns; Getting the equivalent of retrieving a list of supercolumns by name The main issue with replacing super columns with composite columns right now is that if you don't know all your sub-column names you can't select multiple super columns worth of data in the same query without getting extra stuff. You have to use a slice to get all subcolumns of a given super column, and you can't have disjoint slices, so if you want two super columns full, you have to get all the other stuff that is in between them, or make two queries. If you know what all of the sub-column names are you can ask for all of the super/sub column pairs for all of the super columns you want and not get extra data. If you don't need to pull multiple super columns at a time with slices like that, then there isn't really an issue. A bonus of using composite keys like this, is that if there is a specific sub column you want from multiple super columns, you can pull all those out with a single multiget and you don't have to pull the rest of the columns... So there are pros and cons... -Jeremiah On 01/03/2012 01:58 PM, Asil Klin wrote: I have a super columns family which I always use to retrieve a list of supercolumns(with all subcolumns) by name. I am looking forward to replace all SuperColumns in my schema with the composite columns. How could I design schema so that I could do the equivalent of retrieving a list of supercolumns by name, in case of using composite columns. (As of now I thought of using the supercolumn name as the first component of the composite name and the subcolumn name as 2nd component of composite name.)
RE: Suggestion about syntax of CREATE COLUMN FAMILY
I'd like to second this. I've been working with Cassandra for a good while now, but when I first started little things like this were confusing. From: Don Smith [mailto:dsm...@likewise.com] Sent: Friday, December 09, 2011 3:41 PM To: user@cassandra.apache.org Subject: Suggestion about syntax of CREATE COLUMN FAMILY Currently, the syntax for creating column families is like this: create column family Users with comparator=UTF8Type and default_validation_class=UTF8Type and key_validation_class=UTF8Type; It's not clear what comparator and default_validation_class refer to. Much clearer would be: create column family Users with column_name_comparator=UTF8Type and column_value_validation_class=UTF8Type and key_validation_class=UTF8Type; BTW, instead of column_name_comparator, I'd actually prefer column_key_comparator since it seems more accurate to call column names column keys. Don
Single node
Is there a way to set up a single node cluster without specifying anything about the specific machine in cassandra.yaml? I've cleared the values from listen_address and rpc_address, but it complains upon startup that no other nodes can be seen (presumably because the ip in the seeds doesn't match). The reason I'm trying to do this is because we deploy cassandra on each developer's machine, and we'd like to be able to use our client across machines using the hostname. Ideally, none of the developers would have to change the base config that gets deployed. The default config file works as a single node cluster, but won't let you talk to it across machines (we're using windows, in case it's relevant). Cheers, Steve
RE: Single node
Just solved it. I’m using localhost for the listen_address, 0.0.0.0 for the rpc_address, and 127.0.0.1 for the seeds. Cheers, Steve From: Vijay [mailto:vijay2...@gmail.com] Sent: Thursday, December 08, 2011 2:15 PM To: user@cassandra.apache.org Subject: Re: Single node You can add a DNS entry with multiple IP's or something like a elastic ip which will keep switching between the active machines. or you can also write your custom seed provider class. Not sure if you will get a quorum when there dev's are on vacation :) Regards, /VJ On Thu, Dec 8, 2011 at 11:05 AM, Stephen Pope stephen.p...@quest.commailto:stephen.p...@quest.com wrote: Is there a way to set up a single node cluster without specifying anything about the specific machine in cassandra.yaml? I’ve cleared the values from listen_address and rpc_address, but it complains upon startup that no other nodes can be seen (presumably because the ip in the seeds doesn’t match). The reason I’m trying to do this is because we deploy cassandra on each developer’s machine, and we’d like to be able to use our client across machines using the hostname. Ideally, none of the developers would have to change the base config that gets deployed. The default config file works as a single node cluster, but won’t let you talk to it across machines (we’re using windows, in case it’s relevant). Cheers, Steve
cassandra.bat install
I've got the 1.0 rc2 binaries, but it looks like somebody forgot to include the Apache Daemon in the zip. According to the batch file there should be a bin\daemon directory, with a prunsrv executable in there. Cheers, Steve
Column Family names
Using 0.8.2, I've created a column family called _Schema (without the quotes). For some reason, I can't seem to list the rows in it from the cli: I've tried: [default@BIM] list _Schema; Syntax error at position 5: unexpected _ for `list _Schema;`. [default@BIM] list '_Schema'; Syntax error at position 5: mismatched input ''_Schema'' expecting Identifier [default@BIM] list _Schema; Syntax error at position 5: unexpected for `list _Schema;`. Am I doing something wrong? Also, after creating the (empty) column family, I then try to read the entire column family using get_range_slices. I'm using an empty byte array for the start key (and start column), and a byte array containing '\u' for the end key (and end column). When I do this, Cassandra throws this: java.lang.OutOfMemoryError: Java heap space Dumping heap to java_pid2508.hprof ... Heap dump file created [5211347 bytes in 0.100 secs] ERROR 10:44:07,543 Internal error processing get_range_slices java.lang.OutOfMemoryError: Java heap space at java.util.ArrayList.init(ArrayList.java:112) at org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy. java:670) at org.apache.cassandra.thrift.CassandraServer.get_range_slices(Cassandr aServer.java:617) at org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.proc ess(Cassandra.java:3202) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.jav a:2889) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run (CustomTThreadPoolServer.java:187) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec utor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:908) at java.lang.Thread.run(Thread.java:662) Even though I've got 8GB of ram in my machine, and the java process is only using 92MB of memory. Has anyone seen this before? Cheers, Steve
RE: Column Family names
Hmm...I've tried changing my column family name to MySchema instead. Now the cli is behaving normally, but the OOM error still occurs when I get_range_slices from my code. From: Stephen Pope [mailto:stephen.p...@quest.com] Sent: Thursday, August 25, 2011 11:10 AM To: user@cassandra.apache.org Subject: Column Family names Using 0.8.2, I've created a column family called _Schema (without the quotes). For some reason, I can't seem to list the rows in it from the cli: I've tried: [default@BIM] list _Schema; Syntax error at position 5: unexpected _ for `list _Schema;`. [default@BIM] list '_Schema'; Syntax error at position 5: mismatched input ''_Schema'' expecting Identifier [default@BIM] list _Schema; Syntax error at position 5: unexpected for `list _Schema;`. Am I doing something wrong? Also, after creating the (empty) column family, I then try to read the entire column family using get_range_slices. I'm using an empty byte array for the start key (and start column), and a byte array containing '\u' for the end key (and end column). When I do this, Cassandra throws this: java.lang.OutOfMemoryError: Java heap space Dumping heap to java_pid2508.hprof ... Heap dump file created [5211347 bytes in 0.100 secs] ERROR 10:44:07,543 Internal error processing get_range_slices java.lang.OutOfMemoryError: Java heap space at java.util.ArrayList.init(ArrayList.java:112) at org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy. java:670) at org.apache.cassandra.thrift.CassandraServer.get_range_slices(Cassandr aServer.java:617) at org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.proc ess(Cassandra.java:3202) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.jav a:2889) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run (CustomTThreadPoolServer.java:187) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec utor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:908) at java.lang.Thread.run(Thread.java:662) Even though I've got 8GB of ram in my machine, and the java process is only using 92MB of memory. Has anyone seen this before? Cheers, Steve
RE: Column Family names
Never mind. I've got a hard-coded Count on the KeyRange set to 2 billion, which is apparently beyond the maximum allowable. From: Stephen Pope [mailto:stephen.p...@quest.com] Sent: Thursday, August 25, 2011 11:15 AM To: user@cassandra.apache.org Subject: RE: Column Family names Hmm...I've tried changing my column family name to MySchema instead. Now the cli is behaving normally, but the OOM error still occurs when I get_range_slices from my code. From: Stephen Pope [mailto:stephen.p...@quest.com] Sent: Thursday, August 25, 2011 11:10 AM To: user@cassandra.apache.org Subject: Column Family names Using 0.8.2, I've created a column family called _Schema (without the quotes). For some reason, I can't seem to list the rows in it from the cli: I've tried: [default@BIM] list _Schema; Syntax error at position 5: unexpected _ for `list _Schema;`. [default@BIM] list '_Schema'; Syntax error at position 5: mismatched input ''_Schema'' expecting Identifier [default@BIM] list _Schema; Syntax error at position 5: unexpected for `list _Schema;`. Am I doing something wrong? Also, after creating the (empty) column family, I then try to read the entire column family using get_range_slices. I'm using an empty byte array for the start key (and start column), and a byte array containing '\u' for the end key (and end column). When I do this, Cassandra throws this: java.lang.OutOfMemoryError: Java heap space Dumping heap to java_pid2508.hprof ... Heap dump file created [5211347 bytes in 0.100 secs] ERROR 10:44:07,543 Internal error processing get_range_slices java.lang.OutOfMemoryError: Java heap space at java.util.ArrayList.init(ArrayList.java:112) at org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy. java:670) at org.apache.cassandra.thrift.CassandraServer.get_range_slices(Cassandr aServer.java:617) at org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.proc ess(Cassandra.java:3202) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.jav a:2889) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run (CustomTThreadPoolServer.java:187) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec utor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:908) at java.lang.Thread.run(Thread.java:662) Even though I've got 8GB of ram in my machine, and the java process is only using 92MB of memory. Has anyone seen this before? Cheers, Steve
CompositeType
Hey, is there any documentation or examples of how to use the CompositeType? I can't find anything about it on the wiki or the datastax docs. Cheers, Steve
Aggregation and Co-Processors
I just finished watching the video by Eric Evans on CQL - Not just NoSQL. It's MoSQL, and I heard mention of aggregation queries. He said there's been some talk about it, and that you guys were calling it co-processors. Can somebody give me the gist of what that's all about? I couldn't find any mention of it on the wiki. Cheers, Steve
cqlsh error using assume
I'm trying to use cqlsh (on Windows) to get some values from my database using secondary indexes. I'm not sure if it's something I'm doing or not (I can't seem to find any syntactical help for assume). I'm running: assume TransactionLogs comparator as ascii where TransactionLogs is my column family, and has string column names in it. The resulting (intuitive) error message is: line 1:0 no viable alternative at input 'assume' Anybody know what this means? Cheers, Steve
Modeling troubles
For a side project I'm working on I want to store the entire set of possible Reversi boards. There are an estimated 10^28 possible boards. Each board (from the best way I could think of to implement it) is made up of 2, 64-bit numbers (black pieces, white pieces...pieces in neither of those are empty spaces) and a bit to indicate who's turn it is. I've thought of a few possible ways to do it: - Entire board as row key, in an array of bytes. I'm not sure how well Cassandra can handle 10^28 rows. I could also break this up into separate cfs for each depth of move (initially there are 4 pieces on the board in total. I could make a cf for 5 piece, 6, etc to 64). I'm not sure if there's any advantage to doing that. - 64-bit number for the black pieces as row key, with 65-bit column names (white pieces + turn). I've read somewhere that there's a rough limit of 2-billion columns, so this will be problematic for certain. This can also be broken into separate cfs, but I'm still going to hit the column limit Is there a better way to achieve what I'm trying to do, or will either of these approaches surprise me and work properly?
RE: cqlsh error using assume
Boo-urns. Ok, thanks. -Original Message- From: Brandon Williams [mailto:dri...@gmail.com] Sent: Thursday, July 21, 2011 9:10 AM To: user@cassandra.apache.org Subject: Re: cqlsh error using assume 'assume' is only valid in the cli, not cql. On Thu, Jul 21, 2011 at 7:59 AM, Stephen Pope stephen.p...@quest.com wrote: I'm trying to use cqlsh (on Windows) to get some values from my database using secondary indexes. I'm not sure if it's something I'm doing or not (I can't seem to find any syntactical help for assume). I'm running: assume TransactionLogs comparator as ascii where TransactionLogs is my column family, and has string column names in it. The resulting (intuitive) error message is: line 1:0 no viable alternative at input 'assume' Anybody know what this means? Cheers, Steve
RE: sstabletojson
Perfect, thanks! -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Tuesday, July 12, 2011 5:53 PM To: user@cassandra.apache.org Subject: Re: sstabletojson You can upgrade to 0.8.1 to fix this. :) On Tue, Jul 12, 2011 at 1:03 PM, Stephen Pope stephen.p...@quest.com wrote: Hey there. I'm trying to convert one of my sstables to json, but it doesn't appear to be escaping quotes. As a result, I've got a line in my resulting json like this: 3230303930373139313734303236efbfbf3331313733: [[6d6573736167655f6964, 66AA9165386616028BD3FECF893BBAC204347F3BAF@CONFLICT,6.HUSHEDFIRE.COM, 634447747524175316]], Attempting to convert this json back into an sstable results in: C:\cassandra\apache-cassandra-0.8.0\binjson2sstable.bat -K BIM -c TransactionLogs json.dat out.db org.codehaus.jackson.JsonParseException: Unexpected character ('' (code 60)): w as expecting comma to separate ARRAY entries at [Source: json.dat; line: 31175, column: 299] at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:929) at org.codehaus.jackson.impl.JsonParserBase._reportError(JsonParserBase. java:632) at org.codehaus.jackson.impl.JsonParserBase._reportUnexpectedChar(JsonPa rserBase.java:565) at org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser .java:128) at org.codehaus.jackson.map.deser.UntypedObjectDeserializer.mapArray(Unt ypedObjectDeserializer.java:81) at org.codehaus.jackson.map.deser.UntypedObjectDeserializer.deserialize( UntypedObjectDeserializer.java:62) at org.codehaus.jackson.map.deser.UntypedObjectDeserializer.mapArray(Unt ypedObjectDeserializer.java:82) at org.codehaus.jackson.map.deser.UntypedObjectDeserializer.deserialize( UntypedObjectDeserializer.java:62) at org.codehaus.jackson.map.deser.MapDeserializer._readAndBind(MapDeseri alizer.java:197) at org.codehaus.jackson.map.deser.MapDeserializer.deserialize(MapDeseria lizer.java:145) at org.codehaus.jackson.map.deser.MapDeserializer.deserialize(MapDeseria lizer.java:23) at org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:12 61) at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:517 ) at org.codehaus.jackson.JsonParser.readValueAs(JsonParser.java:897) at org.apache.cassandra.tools.SSTableImport.importUnsorted(SSTableImport .java:263) at org.apache.cassandra.tools.SSTableImport.importJson(SSTableImport.jav a:252) at org.apache.cassandra.tools.SSTableImport.main(SSTableImport.java:476) Is there anything I can do with my data to fix this? Cheers, Steve -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
BulkLoader
I'm trying to figure out how to use the BulkLoader, and it looks like there's no way to run it against a local machine, because of this: SetInetAddress hosts = Gossiper.instance.getLiveMembers(); hosts.remove(FBUtilities.getLocalAddress()); if (hosts.isEmpty()) throw new IllegalStateException(Cannot load any sstable, no live member found in the cluster); Is this intended behavior? May I ask why? We'd like to be able to run it against the local machine. Cheers, Steve
RE: BulkLoader
I think I've solved my own problem here. After generating the sstable using json2sstable it looks like I can simply copy the created sstable into my data directory. Can anyone think of any potential problems with doing it this way? -Original Message- From: Stephen Pope [mailto:stephen.p...@quest.com] Sent: Wednesday, July 13, 2011 9:32 AM To: user@cassandra.apache.org Subject: BulkLoader I'm trying to figure out how to use the BulkLoader, and it looks like there's no way to run it against a local machine, because of this: SetInetAddress hosts = Gossiper.instance.getLiveMembers(); hosts.remove(FBUtilities.getLocalAddress()); if (hosts.isEmpty()) throw new IllegalStateException(Cannot load any sstable, no live member found in the cluster); Is this intended behavior? May I ask why? We'd like to be able to run it against the local machine. Cheers, Steve
RE: BulkLoader
Fair enough. My original question stands then. :) Why aren't you allowed to talk to a local installation using BulkLoader? -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Wednesday, July 13, 2011 11:06 AM To: user@cassandra.apache.org Subject: Re: BulkLoader Sure, that will work fine with a single machine. The advantage of bulkloader is it handles splitting the sstable up and sending each piece to the right place(s) when you have more than one. On Wed, Jul 13, 2011 at 7:47 AM, Stephen Pope stephen.p...@quest.com wrote: I think I've solved my own problem here. After generating the sstable using json2sstable it looks like I can simply copy the created sstable into my data directory. Can anyone think of any potential problems with doing it this way? -Original Message- From: Stephen Pope [mailto:stephen.p...@quest.com] Sent: Wednesday, July 13, 2011 9:32 AM To: user@cassandra.apache.org Subject: BulkLoader I'm trying to figure out how to use the BulkLoader, and it looks like there's no way to run it against a local machine, because of this: SetInetAddress hosts = Gossiper.instance.getLiveMembers(); hosts.remove(FBUtilities.getLocalAddress()); if (hosts.isEmpty()) throw new IllegalStateException(Cannot load any sstable, no live member found in the cluster); Is this intended behavior? May I ask why? We'd like to be able to run it against the local machine. Cheers, Steve -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
RE: BulkLoader
Ahhh..ok. Thanks. -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Wednesday, July 13, 2011 11:35 AM To: user@cassandra.apache.org Subject: Re: BulkLoader Because it's hooking directly into gossip, so the local instance it's ignoring is the bulkloader process, not Cassandra. You'd need to run the bulkloader from a different IP, than Cassandra. On Wed, Jul 13, 2011 at 8:22 AM, Stephen Pope stephen.p...@quest.com wrote: Fair enough. My original question stands then. :) Why aren't you allowed to talk to a local installation using BulkLoader? -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Wednesday, July 13, 2011 11:06 AM To: user@cassandra.apache.org Subject: Re: BulkLoader Sure, that will work fine with a single machine. The advantage of bulkloader is it handles splitting the sstable up and sending each piece to the right place(s) when you have more than one. On Wed, Jul 13, 2011 at 7:47 AM, Stephen Pope stephen.p...@quest.com wrote: I think I've solved my own problem here. After generating the sstable using json2sstable it looks like I can simply copy the created sstable into my data directory. Can anyone think of any potential problems with doing it this way? -Original Message- From: Stephen Pope [mailto:stephen.p...@quest.com] Sent: Wednesday, July 13, 2011 9:32 AM To: user@cassandra.apache.org Subject: BulkLoader I'm trying to figure out how to use the BulkLoader, and it looks like there's no way to run it against a local machine, because of this: SetInetAddress hosts = Gossiper.instance.getLiveMembers(); hosts.remove(FBUtilities.getLocalAddress()); if (hosts.isEmpty()) throw new IllegalStateException(Cannot load any sstable, no live member found in the cluster); Is this intended behavior? May I ask why? We'd like to be able to run it against the local machine. Cheers, Steve -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
sstabletojson
Hey there. I'm trying to convert one of my sstables to json, but it doesn't appear to be escaping quotes. As a result, I've got a line in my resulting json like this: 3230303930373139313734303236efbfbf3331313733: [[6d6573736167655f6964, 66AA9165386616028BD3FECF893BBAC204347F3BAF@CONFLICT,6.HUSHEDFIRE.COM, 634447747524175316]], Attempting to convert this json back into an sstable results in: C:\cassandra\apache-cassandra-0.8.0\binjson2sstable.bat -K BIM -c TransactionLogs json.dat out.db org.codehaus.jackson.JsonParseException: Unexpected character ('' (code 60)): w as expecting comma to separate ARRAY entries at [Source: json.dat; line: 31175, column: 299] at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:929) at org.codehaus.jackson.impl.JsonParserBase._reportError(JsonParserBase. java:632) at org.codehaus.jackson.impl.JsonParserBase._reportUnexpectedChar(JsonPa rserBase.java:565) at org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser .java:128) at org.codehaus.jackson.map.deser.UntypedObjectDeserializer.mapArray(Unt ypedObjectDeserializer.java:81) at org.codehaus.jackson.map.deser.UntypedObjectDeserializer.deserialize( UntypedObjectDeserializer.java:62) at org.codehaus.jackson.map.deser.UntypedObjectDeserializer.mapArray(Unt ypedObjectDeserializer.java:82) at org.codehaus.jackson.map.deser.UntypedObjectDeserializer.deserialize( UntypedObjectDeserializer.java:62) at org.codehaus.jackson.map.deser.MapDeserializer._readAndBind(MapDeseri alizer.java:197) at org.codehaus.jackson.map.deser.MapDeserializer.deserialize(MapDeseria lizer.java:145) at org.codehaus.jackson.map.deser.MapDeserializer.deserialize(MapDeseria lizer.java:23) at org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:12 61) at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:517 ) at org.codehaus.jackson.JsonParser.readValueAs(JsonParser.java:897) at org.apache.cassandra.tools.SSTableImport.importUnsorted(SSTableImport .java:263) at org.apache.cassandra.tools.SSTableImport.importJson(SSTableImport.jav a:252) at org.apache.cassandra.tools.SSTableImport.main(SSTableImport.java:476) Is there anything I can do with my data to fix this? Cheers, Steve
bulk load
According to the README.txt in examples/bmt BinaryMemtable is being deprecated. What's the recommended way to do bulk loading? Cheers, Steve
RE: bulk load
Awesome, thanks! -Original Message- From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com] Sent: Wednesday, June 22, 2011 3:08 PM To: user@cassandra.apache.org Subject: Re: bulk load This ticket's outcome replaces what BMT was supposed to do: https://issues.apache.org/jira/browse/CASSANDRA-1278 0.8.1 is being voted on now and will hopefully be out in the next day or two. You can try it out with the 0.8-branch if you want - looking near the bottom of the comments on the ticket, it has impressive performance. On Jun 22, 2011, at 2:00 PM, Stephen Pope wrote: According to the README.txt in examples/bmt BinaryMemtable is being deprecated. What's the recommended way to do bulk loading? Cheers, Steve
CommitLog replay
Hi there. This is my first message to the mailing list, so let me know if I'm doing it wrong. :) I've got a single node deployment of 0.8 set up on my windows box. When I insert a bunch of data into it, the commitlogs directory doesn't clear upon completion (should it?). As a result, when I stop and restart Cassandra it replays all the commitlogs, then starts compacting (which seems like it's taking a long time). While it's compacting it won't talk to my test client. Am I doing something wrong? Cheers, Steve
RE: CommitLog replay
I've only got one cf, and haven't changed the default flush expiry period. I'm not sure the node had fully started or not. I had to restart my data insertion (for other reasons), so I can check the system log upon restart when the data is finished inserting. Do you know off-hand how long the default flush expiry period is? Cheers, Steve -Original Message- From: sc...@scode.org [mailto:sc...@scode.org] On Behalf Of Peter Schuller Sent: Tuesday, June 21, 2011 9:13 AM To: user@cassandra.apache.org Subject: Re: CommitLog replay I’ve got a single node deployment of 0.8 set up on my windows box. When I insert a bunch of data into it, the commitlogs directory doesn’t clear upon completion (should it?). It is expected that commit logs are retained for a while, and that there is reply going on when restarting a node. The main way to ensure that a smaller amount of commit log is active at any given moment, is to ensure that all column familes are flushed sufficiently often. This is because when column families are flushed, they are no longer necessitating the retention of the commit logs that contain the writes that were just flushed. Pay attention to whether you maybe have some cf:s that are written very rarely and won't flush until the flush expiry period. As a result, when I stop and restart Cassandra it replays all the commitlogs, then starts compacting (which seems like it’s taking a long time). While it’s compacting it won’t talk to my test client. That it starts compacting is expected if the data flushed as a result of the commit log reply triggers compactions. However, compaction does not imply that the node refuses to talk to clients. Are you sure the node has fully started? it should log when it starts up the thrift interface - check system.log. -- / Peter Schuller