RE: Get few rows by composite key.

2012-03-19 Thread Stephen Pope
I'm not sure about Hector code (somebody else can chime in here), but to find 
the keys you're after you can slice to get the keys from AA:BB to BB:AA.

Cheers,
Steve

From: Michael Cherkasov [mailto:michael.cherka...@gmail.com]
Sent: Monday, March 19, 2012 9:30 AM
To: user@cassandra.apache.org
Subject: Get few rows by composite key.

Hello,
Assume that we have table like this one:

Key:Columns names:
AA:AA 1:A 1:B 1:C 2:A 2:C
AA:BB 1:C 2:A 2:C
AA:CC 2:A 2:C
AA:DD 1:A 1:B 1:C
BB:AA 1:A 1:B 2:C
BB:BB 1:A 1:B 1:C 2:C
BB:CC 1:A  2:A 2:C
BB:DD 1:A  1:C 2:A 2:C

Is there any way to take rows with first key's part equals AA and second more 
or equal BB?
I'm interesting about Hector code.


RE: Replacing supercolumns with composite columns; Getting the equivalent of retrieving a list of supercolumns by name

2012-01-04 Thread Stephen Pope
I don't think I can tell my exact column names in many cases. For example most 
of our queries are for specific keys, and an unknown range of numbers (like 
key1, key where number  1). How can I set up my slice in this case to 
retrieve only the columns that match both criteria?

Cheers,
Steve

From: rajkumar@gmail.com [mailto:rajkumar@gmail.com] On Behalf Of Asil 
Klin
Sent: Wednesday, January 04, 2012 12:21 AM
To: user@cassandra.apache.org
Subject: Re: Replacing supercolumns with composite columns; Getting the 
equivalent of retrieving a list of supercolumns by name

@Stephan: in that case, you can easily tell the names of all columns you want 
to retrieve, so you can make a query to retrieve those list of composite 
columns.


@Jeremiah,
So where is my best bet ? Should I leave the supercolumns as it is as of now, 
since I can find a good way to use them incase I replace them with composite 
columns?


On Wed, Jan 4, 2012 at 4:01 AM, Stephen Pope 
stephen.p...@quest.commailto:stephen.p...@quest.com wrote:
 The bonus you're talking about here, how do I apply that?

 For example, my columns are in the form of number.idhttp://number.id such as 
4.steve, 4.greg, 5.steve, 5.george. Is there a way to query a slice of numbers 
with a list of ids? As in, I want all the columns with numbers between 4 and 10 
which have ids steve or greg.

 Cheers,
 Steve

-Original Message-
From: Jeremiah Jordan 
[mailto:jeremiah.jor...@morningstar.commailto:jeremiah.jor...@morningstar.com]
Sent: Tuesday, January 03, 2012 3:12 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Cc: Asil Klin
Subject: Re: Replacing supercolumns with composite columns; Getting the 
equivalent of retrieving a list of supercolumns by name

The main issue with replacing super columns with composite columns right now is 
that if you don't know all your sub-column names you can't select multiple 
super columns worth of data in the same query without getting extra stuff.  
You have to use a slice to get all subcolumns of a given super column, and you 
can't have disjoint slices, so if you want two super columns full, you have to 
get all the other stuff that is in between them, or make two queries.
If you know what all of the sub-column names are you can ask for all of the 
super/sub column pairs for all of the super columns you want and not get extra 
data.

If you don't need to pull multiple super columns at a time with slices like 
that, then there isn't really an issue.

A bonus of using composite keys like this, is that if there is a specific sub 
column you want from multiple super columns, you can pull all those out with a 
single multiget and you don't have to pull the rest of the columns...

So there are pros and cons...

-Jeremiah


On 01/03/2012 01:58 PM, Asil Klin wrote:
 I have a super columns family which I always use to retrieve a list of
 supercolumns(with all subcolumns) by name. I am looking forward to
 replace all SuperColumns in my schema with the composite columns.

 How could I design schema so that I could do the equivalent of
 retrieving a list of supercolumns by name, in case of using composite
 columns.

 (As of now I thought of using the supercolumn name as the first
 component of the composite name and the subcolumn name as 2nd
 component of composite name.)



RE: Replacing supercolumns with composite columns; Getting the equivalent of retrieving a list of supercolumns by name

2012-01-03 Thread Stephen Pope
 The bonus you're talking about here, how do I apply that?

 For example, my columns are in the form of number.id such as 4.steve, 4.greg, 
5.steve, 5.george. Is there a way to query a slice of numbers with a list of 
ids? As in, I want all the columns with numbers between 4 and 10 which have ids 
steve or greg.

 Cheers,
 Steve

-Original Message-
From: Jeremiah Jordan [mailto:jeremiah.jor...@morningstar.com] 
Sent: Tuesday, January 03, 2012 3:12 PM
To: user@cassandra.apache.org
Cc: Asil Klin
Subject: Re: Replacing supercolumns with composite columns; Getting the 
equivalent of retrieving a list of supercolumns by name

The main issue with replacing super columns with composite columns right now is 
that if you don't know all your sub-column names you can't select multiple 
super columns worth of data in the same query without getting extra stuff.  
You have to use a slice to get all subcolumns of a given super column, and you 
can't have disjoint slices, so if you want two super columns full, you have to 
get all the other stuff that is in between them, or make two queries.
If you know what all of the sub-column names are you can ask for all of the 
super/sub column pairs for all of the super columns you want and not get extra 
data.

If you don't need to pull multiple super columns at a time with slices like 
that, then there isn't really an issue.

A bonus of using composite keys like this, is that if there is a specific sub 
column you want from multiple super columns, you can pull all those out with a 
single multiget and you don't have to pull the rest of the columns...

So there are pros and cons...

-Jeremiah


On 01/03/2012 01:58 PM, Asil Klin wrote:
 I have a super columns family which I always use to retrieve a list of 
 supercolumns(with all subcolumns) by name. I am looking forward to 
 replace all SuperColumns in my schema with the composite columns.

 How could I design schema so that I could do the equivalent of 
 retrieving a list of supercolumns by name, in case of using composite 
 columns.

 (As of now I thought of using the supercolumn name as the first 
 component of the composite name and the subcolumn name as 2nd 
 component of composite name.)


RE: Suggestion about syntax of CREATE COLUMN FAMILY

2011-12-12 Thread Stephen Pope
I'd like to second this. I've been working with Cassandra for a good while now, 
but when I first started little things like this were confusing.

From: Don Smith [mailto:dsm...@likewise.com]
Sent: Friday, December 09, 2011 3:41 PM
To: user@cassandra.apache.org
Subject: Suggestion about syntax of CREATE COLUMN FAMILY

Currently, the syntax for creating column families is like this:
create column family Users
with comparator=UTF8Type
and default_validation_class=UTF8Type
and key_validation_class=UTF8Type;

It's not clear what comparator and default_validation_class refer to. Much 
clearer would be:
create column family Users
with column_name_comparator=UTF8Type
and column_value_validation_class=UTF8Type
and key_validation_class=UTF8Type;

BTW, instead of column_name_comparator, I'd actually prefer 
column_key_comparator since it seems more accurate to call column names 
column keys.

  Don


Single node

2011-12-08 Thread Stephen Pope
Is there a way to set up a single node cluster without specifying anything 
about the specific machine in cassandra.yaml? I've cleared the values from 
listen_address and rpc_address, but it complains upon startup that no other 
nodes can be seen (presumably because the ip in the seeds doesn't match).

The reason I'm trying to do this is because we deploy cassandra on each 
developer's machine, and we'd like to be able to use our client across machines 
using the hostname. Ideally, none of the developers would have to change the 
base config that gets deployed.

The default config file works as a single node cluster, but won't let you talk 
to it across machines (we're using windows, in case it's relevant).

Cheers,
Steve


RE: Single node

2011-12-08 Thread Stephen Pope
Just solved it. I’m using localhost for the listen_address, 0.0.0.0 for the 
rpc_address, and 127.0.0.1 for the seeds.

Cheers,
Steve

From: Vijay [mailto:vijay2...@gmail.com]
Sent: Thursday, December 08, 2011 2:15 PM
To: user@cassandra.apache.org
Subject: Re: Single node

You can add a DNS entry with multiple IP's or something like a elastic ip which 
will keep switching between the active machines. or you can also write your 
custom seed provider class. Not sure if you will get a quorum when there dev's 
are on vacation :)

Regards,
/VJ


On Thu, Dec 8, 2011 at 11:05 AM, Stephen Pope 
stephen.p...@quest.commailto:stephen.p...@quest.com wrote:
Is there a way to set up a single node cluster without specifying anything 
about the specific machine in cassandra.yaml? I’ve cleared the values from 
listen_address and rpc_address, but it complains upon startup that no other 
nodes can be seen (presumably because the ip in the seeds doesn’t match).

The reason I’m trying to do this is because we deploy cassandra on each 
developer’s machine, and we’d like to be able to use our client across machines 
using the hostname. Ideally, none of the developers would have to change the 
base config that gets deployed.

The default config file works as a single node cluster, but won’t let you talk 
to it across machines (we’re using windows, in case it’s relevant).

Cheers,
Steve



cassandra.bat install

2011-10-06 Thread Stephen Pope
I've got the 1.0 rc2 binaries, but it looks like somebody forgot to include the 
Apache Daemon in the zip. According to the batch file there should be a 
bin\daemon directory, with a prunsrv executable in there.

Cheers,
Steve


Column Family names

2011-08-25 Thread Stephen Pope
Using 0.8.2, I've created a column family called _Schema (without the 
quotes). For some reason, I can't seem to list the rows in it from the cli:

I've tried:

[default@BIM] list _Schema;
Syntax error at position 5: unexpected _ for `list _Schema;`.
[default@BIM] list '_Schema';
Syntax error at position 5: mismatched input ''_Schema'' expecting Identifier
[default@BIM] list _Schema;
Syntax error at position 5: unexpected  for `list _Schema;`.

Am I doing something wrong?

Also, after creating the (empty) column family, I then try to read the entire 
column family using get_range_slices. I'm using an empty byte array for the 
start key (and start column), and a byte array containing '\u' for the end 
key (and end column). When I do this, Cassandra throws this:

java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid2508.hprof ...
Heap dump file created [5211347 bytes in 0.100 secs]
ERROR 10:44:07,543 Internal error processing get_range_slices
java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.init(ArrayList.java:112)
at org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.
java:670)
at org.apache.cassandra.thrift.CassandraServer.get_range_slices(Cassandr
aServer.java:617)
at org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.proc
ess(Cassandra.java:3202)
at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.jav
a:2889)
at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run
(CustomTThreadPoolServer.java:187)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
at java.lang.Thread.run(Thread.java:662)

Even though I've got 8GB of ram in my machine, and the java process is only 
using 92MB of memory.

Has anyone seen this before?

Cheers,
Steve



RE: Column Family names

2011-08-25 Thread Stephen Pope
Hmm...I've tried changing my column family name to MySchema instead. Now the 
cli is behaving normally, but the OOM error still occurs when I 
get_range_slices from my code.

From: Stephen Pope [mailto:stephen.p...@quest.com]
Sent: Thursday, August 25, 2011 11:10 AM
To: user@cassandra.apache.org
Subject: Column Family names

Using 0.8.2, I've created a column family called _Schema (without the 
quotes). For some reason, I can't seem to list the rows in it from the cli:

I've tried:

[default@BIM] list _Schema;
Syntax error at position 5: unexpected _ for `list _Schema;`.
[default@BIM] list '_Schema';
Syntax error at position 5: mismatched input ''_Schema'' expecting Identifier
[default@BIM] list _Schema;
Syntax error at position 5: unexpected  for `list _Schema;`.

Am I doing something wrong?

Also, after creating the (empty) column family, I then try to read the entire 
column family using get_range_slices. I'm using an empty byte array for the 
start key (and start column), and a byte array containing '\u' for the end 
key (and end column). When I do this, Cassandra throws this:

java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid2508.hprof ...
Heap dump file created [5211347 bytes in 0.100 secs]
ERROR 10:44:07,543 Internal error processing get_range_slices
java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.init(ArrayList.java:112)
at org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.
java:670)
at org.apache.cassandra.thrift.CassandraServer.get_range_slices(Cassandr
aServer.java:617)
at org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.proc
ess(Cassandra.java:3202)
at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.jav
a:2889)
at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run
(CustomTThreadPoolServer.java:187)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
at java.lang.Thread.run(Thread.java:662)

Even though I've got 8GB of ram in my machine, and the java process is only 
using 92MB of memory.

Has anyone seen this before?

Cheers,
Steve



RE: Column Family names

2011-08-25 Thread Stephen Pope
Never mind. I've got a hard-coded Count on the KeyRange set to 2 billion, which 
is apparently beyond the maximum allowable.

From: Stephen Pope [mailto:stephen.p...@quest.com]
Sent: Thursday, August 25, 2011 11:15 AM
To: user@cassandra.apache.org
Subject: RE: Column Family names

Hmm...I've tried changing my column family name to MySchema instead. Now the 
cli is behaving normally, but the OOM error still occurs when I 
get_range_slices from my code.

From: Stephen Pope [mailto:stephen.p...@quest.com]
Sent: Thursday, August 25, 2011 11:10 AM
To: user@cassandra.apache.org
Subject: Column Family names

Using 0.8.2, I've created a column family called _Schema (without the 
quotes). For some reason, I can't seem to list the rows in it from the cli:

I've tried:

[default@BIM] list _Schema;
Syntax error at position 5: unexpected _ for `list _Schema;`.
[default@BIM] list '_Schema';
Syntax error at position 5: mismatched input ''_Schema'' expecting Identifier
[default@BIM] list _Schema;
Syntax error at position 5: unexpected  for `list _Schema;`.

Am I doing something wrong?

Also, after creating the (empty) column family, I then try to read the entire 
column family using get_range_slices. I'm using an empty byte array for the 
start key (and start column), and a byte array containing '\u' for the end 
key (and end column). When I do this, Cassandra throws this:

java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid2508.hprof ...
Heap dump file created [5211347 bytes in 0.100 secs]
ERROR 10:44:07,543 Internal error processing get_range_slices
java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.init(ArrayList.java:112)
at org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.
java:670)
at org.apache.cassandra.thrift.CassandraServer.get_range_slices(Cassandr
aServer.java:617)
at org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.proc
ess(Cassandra.java:3202)
at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.jav
a:2889)
at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run
(CustomTThreadPoolServer.java:187)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
at java.lang.Thread.run(Thread.java:662)

Even though I've got 8GB of ram in my machine, and the java process is only 
using 92MB of memory.

Has anyone seen this before?

Cheers,
Steve



CompositeType

2011-08-15 Thread Stephen Pope
 Hey, is there any documentation or examples of how to use the CompositeType? I 
can't find anything about it on the wiki or the datastax docs.

 Cheers,
 Steve


Aggregation and Co-Processors

2011-07-28 Thread Stephen Pope
I just finished watching the video by Eric Evans on CQL - Not just NoSQL. It's 
MoSQL, and I heard mention of aggregation queries. He said there's been some 
talk about it, and that you guys were calling it co-processors. Can somebody 
give me the gist of what that's all about? I couldn't find any mention of it on 
the wiki.

Cheers,
Steve


cqlsh error using assume

2011-07-21 Thread Stephen Pope
I'm trying to use cqlsh (on Windows) to get some values from my database using 
secondary indexes. I'm not sure if it's something I'm doing or not (I can't 
seem to find any syntactical help for assume). I'm running:

assume TransactionLogs comparator as ascii

where TransactionLogs is my column family, and has string column names in it. 
The resulting (intuitive) error message is:

line 1:0 no viable alternative at input 'assume'

Anybody know what this means?

Cheers,
Steve


Modeling troubles

2011-07-21 Thread Stephen Pope
For a side project I'm working on I want to store the entire set of possible 
Reversi boards. There are an estimated 10^28 possible boards. Each board (from 
the best way I could think of to implement it) is made up of 2, 64-bit numbers 
(black pieces, white pieces...pieces in neither of those are empty spaces) and 
a bit to indicate who's turn it is. I've thought of a few possible ways to do 
it:


-  Entire board as row key, in an array of bytes. I'm not sure how well 
Cassandra can handle 10^28 rows. I could also break this up into separate cfs 
for each depth of move (initially there are 4 pieces on the board in total. I 
could make a cf for 5 piece, 6, etc to 64). I'm not sure if there's any 
advantage to doing that.

-  64-bit number for the black pieces as row key, with 65-bit column 
names (white pieces + turn). I've read somewhere that there's a rough limit of 
2-billion columns, so this will be problematic for certain. This can also be 
broken into separate cfs, but I'm still going to hit the column limit

Is there a better way to achieve what I'm trying to do, or will either of these 
approaches surprise me and work properly?


RE: cqlsh error using assume

2011-07-21 Thread Stephen Pope
 Boo-urns. Ok, thanks.

-Original Message-
From: Brandon Williams [mailto:dri...@gmail.com] 
Sent: Thursday, July 21, 2011 9:10 AM
To: user@cassandra.apache.org
Subject: Re: cqlsh error using assume

'assume' is only valid in the cli, not cql.

On Thu, Jul 21, 2011 at 7:59 AM, Stephen Pope stephen.p...@quest.com wrote:
 I'm trying to use cqlsh (on Windows) to get some values from my database
 using secondary indexes. I'm not sure if it's something I'm doing or not (I
 can't seem to find any syntactical help for assume). I'm running:



 assume TransactionLogs comparator as ascii



 where TransactionLogs is my column family, and has string column names in
 it. The resulting (intuitive) error message is:



 line 1:0 no viable alternative at input 'assume'



 Anybody know what this means?



 Cheers,

 Steve


RE: sstabletojson

2011-07-13 Thread Stephen Pope
 Perfect, thanks!

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Tuesday, July 12, 2011 5:53 PM
To: user@cassandra.apache.org
Subject: Re: sstabletojson

You can upgrade to 0.8.1 to fix this. :)

On Tue, Jul 12, 2011 at 1:03 PM, Stephen Pope stephen.p...@quest.com wrote:
  Hey there. I'm trying to convert one of my sstables to json, but it doesn't 
 appear to be escaping quotes. As a result, I've got a line in my resulting 
 json like this:

 3230303930373139313734303236efbfbf3331313733: [[6d6573736167655f6964, 
 66AA9165386616028BD3FECF893BBAC204347F3BAF@CONFLICT,6.HUSHEDFIRE.COM, 
 634447747524175316]],

  Attempting to convert this json back into an sstable results in:

 C:\cassandra\apache-cassandra-0.8.0\binjson2sstable.bat -K BIM -c 
 TransactionLogs json.dat out.db

 org.codehaus.jackson.JsonParseException: Unexpected character ('' (code 
 60)): w
 as expecting comma to separate ARRAY entries
  at [Source: json.dat; line: 31175, column: 299]
        at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:929)
        at 
 org.codehaus.jackson.impl.JsonParserBase._reportError(JsonParserBase.
 java:632)
        at 
 org.codehaus.jackson.impl.JsonParserBase._reportUnexpectedChar(JsonPa
 rserBase.java:565)
        at 
 org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser
 .java:128)
        at 
 org.codehaus.jackson.map.deser.UntypedObjectDeserializer.mapArray(Unt
 ypedObjectDeserializer.java:81)
        at 
 org.codehaus.jackson.map.deser.UntypedObjectDeserializer.deserialize(
 UntypedObjectDeserializer.java:62)
        at 
 org.codehaus.jackson.map.deser.UntypedObjectDeserializer.mapArray(Unt
 ypedObjectDeserializer.java:82)
        at 
 org.codehaus.jackson.map.deser.UntypedObjectDeserializer.deserialize(
 UntypedObjectDeserializer.java:62)
        at 
 org.codehaus.jackson.map.deser.MapDeserializer._readAndBind(MapDeseri
 alizer.java:197)
        at 
 org.codehaus.jackson.map.deser.MapDeserializer.deserialize(MapDeseria
 lizer.java:145)
        at 
 org.codehaus.jackson.map.deser.MapDeserializer.deserialize(MapDeseria
 lizer.java:23)
        at 
 org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:12
 61)
        at 
 org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:517
 )
        at org.codehaus.jackson.JsonParser.readValueAs(JsonParser.java:897)
        at 
 org.apache.cassandra.tools.SSTableImport.importUnsorted(SSTableImport
 .java:263)
        at 
 org.apache.cassandra.tools.SSTableImport.importJson(SSTableImport.jav
 a:252)
        at 
 org.apache.cassandra.tools.SSTableImport.main(SSTableImport.java:476)


  Is there anything I can do with my data to fix this?

  Cheers,
  Steve




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


BulkLoader

2011-07-13 Thread Stephen Pope
 I'm trying to figure out how to use the BulkLoader, and it looks like there's 
no way to run it against a local machine, because of this:

SetInetAddress hosts = Gossiper.instance.getLiveMembers();
hosts.remove(FBUtilities.getLocalAddress());
if (hosts.isEmpty())
throw new IllegalStateException(Cannot load any sstable, 
no live member found in the cluster);

 Is this intended behavior? May I ask why? We'd like to be able to run it 
against the local machine.

 Cheers,
 Steve


RE: BulkLoader

2011-07-13 Thread Stephen Pope
 I think I've solved my own problem here. After generating the sstable using 
json2sstable it looks like I can simply copy the created sstable into my data 
directory.

 Can anyone think of any potential problems with doing it this way?

-Original Message-
From: Stephen Pope [mailto:stephen.p...@quest.com] 
Sent: Wednesday, July 13, 2011 9:32 AM
To: user@cassandra.apache.org
Subject: BulkLoader

 I'm trying to figure out how to use the BulkLoader, and it looks like there's 
no way to run it against a local machine, because of this:

SetInetAddress hosts = Gossiper.instance.getLiveMembers();
hosts.remove(FBUtilities.getLocalAddress());
if (hosts.isEmpty())
throw new IllegalStateException(Cannot load any sstable, 
no live member found in the cluster);

 Is this intended behavior? May I ask why? We'd like to be able to run it 
against the local machine.

 Cheers,
 Steve


RE: BulkLoader

2011-07-13 Thread Stephen Pope
 Fair enough. My original question stands then. :) 

 Why aren't you allowed to talk to a local installation using BulkLoader?

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Wednesday, July 13, 2011 11:06 AM
To: user@cassandra.apache.org
Subject: Re: BulkLoader

Sure, that will work fine with a single machine.  The advantage of
bulkloader is it handles splitting the sstable up and sending each
piece to the right place(s) when you have more than one.

On Wed, Jul 13, 2011 at 7:47 AM, Stephen Pope stephen.p...@quest.com wrote:
  I think I've solved my own problem here. After generating the sstable using 
 json2sstable it looks like I can simply copy the created sstable into my data 
 directory.

  Can anyone think of any potential problems with doing it this way?

 -Original Message-
 From: Stephen Pope [mailto:stephen.p...@quest.com]
 Sent: Wednesday, July 13, 2011 9:32 AM
 To: user@cassandra.apache.org
 Subject: BulkLoader

  I'm trying to figure out how to use the BulkLoader, and it looks like 
 there's no way to run it against a local machine, because of this:

                SetInetAddress hosts = Gossiper.instance.getLiveMembers();
                hosts.remove(FBUtilities.getLocalAddress());
                if (hosts.isEmpty())
                    throw new IllegalStateException(Cannot load any sstable, 
 no live member found in the cluster);

  Is this intended behavior? May I ask why? We'd like to be able to run it 
 against the local machine.

  Cheers,
  Steve




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


RE: BulkLoader

2011-07-13 Thread Stephen Pope
 Ahhh..ok. Thanks.

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Wednesday, July 13, 2011 11:35 AM
To: user@cassandra.apache.org
Subject: Re: BulkLoader

Because it's hooking directly into gossip, so the local instance it's
ignoring is the bulkloader process, not Cassandra.

You'd need to run the bulkloader from a different IP, than Cassandra.

On Wed, Jul 13, 2011 at 8:22 AM, Stephen Pope stephen.p...@quest.com wrote:
  Fair enough. My original question stands then. :)

  Why aren't you allowed to talk to a local installation using BulkLoader?

 -Original Message-
 From: Jonathan Ellis [mailto:jbel...@gmail.com]
 Sent: Wednesday, July 13, 2011 11:06 AM
 To: user@cassandra.apache.org
 Subject: Re: BulkLoader

 Sure, that will work fine with a single machine.  The advantage of
 bulkloader is it handles splitting the sstable up and sending each
 piece to the right place(s) when you have more than one.

 On Wed, Jul 13, 2011 at 7:47 AM, Stephen Pope stephen.p...@quest.com wrote:
  I think I've solved my own problem here. After generating the sstable using 
 json2sstable it looks like I can simply copy the created sstable into my 
 data directory.

  Can anyone think of any potential problems with doing it this way?

 -Original Message-
 From: Stephen Pope [mailto:stephen.p...@quest.com]
 Sent: Wednesday, July 13, 2011 9:32 AM
 To: user@cassandra.apache.org
 Subject: BulkLoader

  I'm trying to figure out how to use the BulkLoader, and it looks like 
 there's no way to run it against a local machine, because of this:

                SetInetAddress hosts = Gossiper.instance.getLiveMembers();
                hosts.remove(FBUtilities.getLocalAddress());
                if (hosts.isEmpty())
                    throw new IllegalStateException(Cannot load any sstable, 
 no live member found in the cluster);

  Is this intended behavior? May I ask why? We'd like to be able to run it 
 against the local machine.

  Cheers,
  Steve




 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


sstabletojson

2011-07-12 Thread Stephen Pope
 Hey there. I'm trying to convert one of my sstables to json, but it doesn't 
appear to be escaping quotes. As a result, I've got a line in my resulting json 
like this:

3230303930373139313734303236efbfbf3331313733: [[6d6573736167655f6964, 
66AA9165386616028BD3FECF893BBAC204347F3BAF@CONFLICT,6.HUSHEDFIRE.COM, 
634447747524175316]],

 Attempting to convert this json back into an sstable results in:

C:\cassandra\apache-cassandra-0.8.0\binjson2sstable.bat -K BIM -c 
TransactionLogs json.dat out.db

org.codehaus.jackson.JsonParseException: Unexpected character ('' (code 60)): w
as expecting comma to separate ARRAY entries
 at [Source: json.dat; line: 31175, column: 299]
at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:929)
at org.codehaus.jackson.impl.JsonParserBase._reportError(JsonParserBase.
java:632)
at org.codehaus.jackson.impl.JsonParserBase._reportUnexpectedChar(JsonPa
rserBase.java:565)
at org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser
.java:128)
at org.codehaus.jackson.map.deser.UntypedObjectDeserializer.mapArray(Unt
ypedObjectDeserializer.java:81)
at org.codehaus.jackson.map.deser.UntypedObjectDeserializer.deserialize(
UntypedObjectDeserializer.java:62)
at org.codehaus.jackson.map.deser.UntypedObjectDeserializer.mapArray(Unt
ypedObjectDeserializer.java:82)
at org.codehaus.jackson.map.deser.UntypedObjectDeserializer.deserialize(
UntypedObjectDeserializer.java:62)
at org.codehaus.jackson.map.deser.MapDeserializer._readAndBind(MapDeseri
alizer.java:197)
at org.codehaus.jackson.map.deser.MapDeserializer.deserialize(MapDeseria
lizer.java:145)
at org.codehaus.jackson.map.deser.MapDeserializer.deserialize(MapDeseria
lizer.java:23)
at org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:12
61)
at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:517
)
at org.codehaus.jackson.JsonParser.readValueAs(JsonParser.java:897)
at org.apache.cassandra.tools.SSTableImport.importUnsorted(SSTableImport
.java:263)
at org.apache.cassandra.tools.SSTableImport.importJson(SSTableImport.jav
a:252)
at org.apache.cassandra.tools.SSTableImport.main(SSTableImport.java:476)


 Is there anything I can do with my data to fix this?

 Cheers,
 Steve


bulk load

2011-06-22 Thread Stephen Pope
According to the README.txt in examples/bmt BinaryMemtable is being deprecated. 
What's the recommended way to do bulk loading?

Cheers,
Steve


RE: bulk load

2011-06-22 Thread Stephen Pope
 Awesome, thanks!

-Original Message-
From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com] 
Sent: Wednesday, June 22, 2011 3:08 PM
To: user@cassandra.apache.org
Subject: Re: bulk load

This ticket's outcome replaces what BMT was supposed to do:
https://issues.apache.org/jira/browse/CASSANDRA-1278

0.8.1 is being voted on now and will hopefully be out in the next day or two.

You can try it out with the 0.8-branch if you want - looking near the bottom of 
the comments on the ticket, it has impressive performance.

On Jun 22, 2011, at 2:00 PM, Stephen Pope wrote:

 According to the README.txt in examples/bmt BinaryMemtable is being 
 deprecated. What's the recommended way to do bulk loading?
  
 Cheers,
 Steve



CommitLog replay

2011-06-21 Thread Stephen Pope
Hi there. This is my first message to the mailing list, so let me know if I'm 
doing it wrong. :)

I've got a single node deployment of 0.8 set up on my windows box. When I 
insert a bunch of data into it, the commitlogs directory doesn't clear upon 
completion (should it?). As a result, when I stop and restart Cassandra it 
replays all the commitlogs, then starts compacting (which seems like it's 
taking a long time). While it's compacting it won't talk to my test client.

Am I doing something wrong?

Cheers,
Steve


RE: CommitLog replay

2011-06-21 Thread Stephen Pope
 I've only got one cf, and haven't changed the default flush expiry period. I'm 
not sure the node had fully started or not. I had to restart my data insertion 
(for other reasons), so I can check the system log upon restart when the data 
is finished inserting.

 Do you know off-hand how long the default flush expiry period is?

 Cheers,
 Steve

-Original Message-
From: sc...@scode.org [mailto:sc...@scode.org] On Behalf Of Peter Schuller
Sent: Tuesday, June 21, 2011 9:13 AM
To: user@cassandra.apache.org
Subject: Re: CommitLog replay

 I’ve got a single node deployment of 0.8 set up on my windows box. When I
 insert a bunch of data into it, the commitlogs directory doesn’t clear upon
 completion (should it?).

It is expected that commit logs are retained for a while, and that
there is reply going on when restarting a node. The main way to ensure
that a smaller amount of commit log is active at any given moment, is
to ensure that all column familes are flushed sufficiently often. This
is because when column families are flushed, they are no longer
necessitating the retention of the commit logs that contain the writes
that were just flushed.

Pay attention to whether you maybe have some cf:s that are written
very rarely and won't flush until the flush expiry period.

 As a result, when I stop and restart Cassandra it
 replays all the commitlogs, then starts compacting (which seems like it’s
 taking a long time). While it’s compacting it won’t talk to my test client.

That it starts compacting is expected if the data flushed as a result
of the commit log reply triggers compactions. However, compaction does
not imply that the node refuses to talk to clients.

Are you sure the node has fully started? it should log when it starts
up the thrift interface - check system.log.

-- 
/ Peter Schuller