Re: pig + hadoop

2011-04-20 Thread pob
Hi,

everything works fine with cassandra 0.7.5, but when I tried with 0.7.3
another errors showed up, but task finished with success whats strange.


2011-04-20 11:45:40,674 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201104201139_0004_m_00_3: Error: java.lang.ClassNotF
oundException: org.apache.thrift.TException
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at
org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:426)
at
org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:456)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getLoadFunc(PigInputFormat.java:153)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:105)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)



2011-04-20 11:45:43,629 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201104201139_0004_m_01_3: org.apache.pig.backend.exe
cutionengine.ExecException: ERROR 2044: The type null cannot be collected as
a Key type
at
org.apache.pig.backend.hadoop.HDataType.getWritableComparableTypes(HDataType.java:143)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:105)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)


2011-04-20 11:42:49,498 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201104201139_0001_m_00_1: Error: java.lang.ClassNotF
oundException: org.apache.commons.lang.ArrayUtils
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at
org.apache.cassandra.utils.ByteBufferUtil.clinit(ByteBufferUtil.java:75)
at org.apache.cassandra.hadoop.pig.CassandraStorage.clinit(Unknown
Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at
org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:426)
at
org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:456)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getLoadFunc(PigInputFormat.java:153)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:105)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)





2011/4/20 Jeremy Hanna jeremy.hanna1...@gmail.com

 Just as an example:

  property
namecassandra.thrift.address/name
value10.12.34.56/value
  /property
  property
namecassandra.thrift.port/name
value9160/value
  /property
  property
namecassandra.partitioner.class/name
valueorg.apache.cassandra.dht.RandomPartitioner/value
  /property


 On Apr 19, 2011, at 10:28 PM, Jeremy Hanna wrote:

  oh yeah - that's what's going on.  what I do is on the machine that I run
 the pig script from, I set the PIG_CONF variable to my HADOOP_HOME/conf
 directory and in my mapred-site.xml file found there, I set the three
 variables.
 
  I don't use environment variables when I run against a cluster.
 
  On Apr 19, 2011, at 9:54 PM, Jeffrey Wang wrote:
 
  Did you set PIG_RPC_PORT in your hadoop-env.sh? I was seeing this error
 for a while before I added that.
 
  -Jeffrey
 
  From: pob [mailto:peterob...@gmail.com]
  Sent: Tuesday, 

Re: pig + hadoop

2011-04-20 Thread pob
my false,

ignore last post.


2011/4/20 pob peterob...@gmail.com

 Hi,

 everything works fine with cassandra 0.7.5, but when I tried with 0.7.3
 another errors showed up, but task finished with success whats strange.


 2011-04-20 11:45:40,674 INFO org.apache.hadoop.mapred.TaskInProgress: Error
 from attempt_201104201139_0004_m_00_3: Error: java.lang.ClassNotF
 oundException: org.apache.thrift.TException
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:247)
 at
 org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:426)
 at
 org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:456)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getLoadFunc(PigInputFormat.java:153)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:105)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)



 2011-04-20 11:45:43,629 INFO org.apache.hadoop.mapred.TaskInProgress: Error
 from attempt_201104201139_0004_m_01_3: org.apache.pig.backend.exe
 cutionengine.ExecException: ERROR 2044: The type null cannot be collected
 as a Key type
 at
 org.apache.pig.backend.hadoop.HDataType.getWritableComparableTypes(HDataType.java:143)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:105)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)


 2011-04-20 11:42:49,498 INFO org.apache.hadoop.mapred.TaskInProgress: Error
 from attempt_201104201139_0001_m_00_1: Error: java.lang.ClassNotF
 oundException: org.apache.commons.lang.ArrayUtils
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
 at
 org.apache.cassandra.utils.ByteBufferUtil.clinit(ByteBufferUtil.java:75)
 at
 org.apache.cassandra.hadoop.pig.CassandraStorage.clinit(Unknown Source)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:247)
 at
 org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:426)
 at
 org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:456)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getLoadFunc(PigInputFormat.java:153)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:105)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)





 2011/4/20 Jeremy Hanna jeremy.hanna1...@gmail.com

 Just as an example:

  property
namecassandra.thrift.address/name
value10.12.34.56/value
  /property
  property
namecassandra.thrift.port/name
value9160/value
  /property
  property
namecassandra.partitioner.class/name
valueorg.apache.cassandra.dht.RandomPartitioner/value
  /property


 On Apr 19, 2011, at 10:28 PM, Jeremy Hanna wrote:

  oh yeah - that's what's going on.  what I do is on the machine that I
 run the pig script from, I set the PIG_CONF variable to my HADOOP_HOME/conf
 directory and in my mapred-site.xml file found there, I set the three
 variables.
 
  I don't use environment variables when I run against a cluster.
 
  On Apr 19, 2011, at 9:54 PM, Jeffrey Wang wrote:
 
  Did you set PIG_RPC_PORT in your 

Re: Tombstones and memtable_operations

2011-04-20 Thread aaron morton
Looks like a bug, I've added a patch here 
https://issues.apache.org/jira/browse/CASSANDRA-2519

Aaron

On 20 Apr 2011, at 13:15, aaron morton wrote:

 Thats what I was looking for, thanks. 
 
 At first glance the behaviour looks inconsistent, we count the number of 
 columns in the delete mutation. But when deleting a row the column count is 
 zero. I'll try to take a look later. 
 
 In the mean time you can force a memtable via JConsole, navigate down to the 
 CF and look for the forceFlush() operation. 
 
 Aaron
 On 20 Apr 2011, at 09:39, Héctor Izquierdo Seliva wrote:
 
 El mié, 20-04-2011 a las 09:08 +1200, aaron morton escribió:
 Yes, I saw that. 
 
 Wanted to know what issue deletes through pelops means so I can work out 
 what command it's sending to cassandra and hopefully I don't waste my time 
 looking in the wrong place. 
 
 Aaron
 
 
 Oh, sorry. Didn't get what you were asking. I use this code: 
 
 RowDeletor deletor = Pelops.createRowDeletor(keySpace);
 deletor.deleteRow(cf, rowId, ConsistencyLevel.QUORUM);
 
 which seems to be calling
 org.apache.cassandra.thrift.Cassandra.Client.remove.
 
 I hope this is useful
 
 
 



Re: Tombstones and memtable_operations

2011-04-20 Thread Héctor Izquierdo Seliva
El mié, 20-04-2011 a las 23:00 +1200, aaron morton escribió:
 Looks like a bug, I've added a patch
 here https://issues.apache.org/jira/browse/CASSANDRA-2519
 
 
 Aaron
 

That was fast! Thanks Aaron




Question about AbstractType class

2011-04-20 Thread Desimpel, Ignace
Cassandra version 0.7.4

 

Hi,

 

I created my own java class as an extension of the AbstractType class.
But I'm not sure about the following items related to the compare
function :

# The remaining bytes of the buffer sometimes is zero during thrift
get_slice execution, however I never store any zero length column name
nor query for it . If normal, what would be the correct handling of the
zero remaining bytes? Would it be something like :

public int compare(ByteBuffer o1, ByteBuffer o2){ 

int ar1Rem = o1.remaining();

int ar2Rem = o2.remaining();

if ( ar1Rem == 0 || ar2Rem == 0 ) {

if ( ar1Rem != 0 ) {

 return 1;

  } else if ( ar2Rem != 0 ) {

 return -1;

  } else {

 return 0;

  }

}

//Add the real compare here

...}

 

# Since in version 0.6.3 the same function was passing an array of
bytes, I assumed that I could now call the ByteBuffer.array() function
in order to get the array of bytes backing up the ByteBuffer. Also the
length of the byte array in 0.6.3 seemed always to correspond to the
bytes of column name stored. But now in version 0.7.4 that ByteBuffer is
not always backed by such an array.

I can still get around this by making the needed buffer myself like :

int ar2Rem = o2.remaining();

byte[] ar2 = new byte[ar2Rem];

o2.get(ar2, 0, ar2Rem);

Question is : Are the remaining bytes the actual bytes for this column
name (eg: 20 bytes) or would that ByteBuffer ever be some wrapper around
some larger stream of data and the remaining bytes number could be 10 M
bytes. Thus I would not be able to detect the end of the column to
compare and I would possibly be allocating a large unneeded byte array?

 

#Using the ByteBuffer's 'get' function also updates the position of the
ByteBuffer. Is the compare function expected to do that or should it
reset the position back to what it was or ...?

 

 

Or maybe there is some good documentation I should read?

 

Ignace

 



Re: Different result after restart

2011-04-20 Thread aaron morton
Checking the simple things first, are you using the 
o.a.c.service.EmbeddedCassandraService or  the o.a.c.EmbeddedServer in the unit 
test directory ? The later deletes data, but it does not sound like you are 
using it.

When the server starts it should read any commit logs, roll them forward and 
then flush all the changes to SSTables. Which will result in the log files been 
deleted from disk, and you should see INFO level log messages that say 
Discarding obsolete commit log:...

Do you get new SSTables written at start up ? 

If you wanted to confirm the data was there take a look at bin/sstable2json 

Hope that helps. 
Aaron



On 20 Apr 2011, at 23:00, Desimpel, Ignace wrote:

 Cassandra version 0.7.4
  
 Hi,
  
 I’m storing (no deletion) in a small test some records to an embedded 
 Cassandra instance.
 Then I connect using Thrift and I can retrieve the data as excepted.
  
 Then I restart the server with the embedded Cassandra, reconnect using Thrift 
 but now the same query gives me no results at all.
 After restart the commitlog directory get cleared leaving only a small log 
 and a small log.header file. The data directory for the keyspace is still 
 present together with the db files corresponding the column families.
 Any idea what I would be doing wrong here?
  
  
 Ignace



Re: Question about AbstractType class

2011-04-20 Thread Sylvain Lebresne
On Wed, Apr 20, 2011 at 1:35 PM, Desimpel, Ignace
ignace.desim...@nuance.com wrote:
 Cassandra version 0.7.4



 Hi,



 I created my own java class as an extension of the AbstractType class. But
 I’m not sure about the following items related to the compare function :

 # The remaining bytes of the buffer sometimes is zero during thrift
 get_slice execution, however I never store any zero length column name nor
 query for it . If normal, what would be the correct handling of the zero
 remaining bytes?

It is normal, the empty ByteBuffer is used in slice queries to indicate the
beginning of the row (start=). More generally, compare and validate
should work for anything you store but also anything you provide for
the 'start'
and 'end' argument of slices.

 Would it be something like :

 public int compare(ByteBuffer o1, ByteBuffer o2){
 int ar1Rem = o1.remaining();
 int ar2Rem = o2.remaining();
 if ( ar1Rem == 0 || ar2Rem == 0 ) {
 if ( ar1Rem != 0 ) {
          return 1;
       } else if ( ar2Rem != 0 ) {
          return -1;
       } else {
          return 0;
       }
 }
 //Add the real compare here
 …….}

That looks reasonable (though not optimal in the number of comparison :))

 # Since in version 0.6.3 the same function was passing an array of bytes, I
 assumed that I could now call the ByteBuffer.array() function in order to
 get the array of bytes backing up the ByteBuffer.

It's not that simple. First, even if you use ByteBuffer.array(),
you'll have to be
careful that the ByteBuffer has a position, a limit and an arrayOffset and you
should take that into account when accessing the backing array. But there is
also no guarantee that the ByteBuffer will have a backing array so you need to
handle this case too (I refer you to the ByteBuffer documentation).

 Also the length of the
 byte array in 0.6.3 seemed always to correspond to the bytes of column name
 stored. But now in version 0.7.4 that ByteBuffer is not always backed by
 such an array.

 I can still get around this by making the needed buffer myself like :

 int ar2Rem = o2.remaining();
 byte[] ar2 = new byte[ar2Rem];
 o2.get(ar2, 0, ar2Rem);

 Question is : Are the remaining bytes the actual bytes for this column name
 (eg: 20 bytes) or would that ByteBuffer ever be some wrapper around some
 larger stream of data and the remaining bytes number could be 10 M bytes.
 Thus I would not be able to detect the end of the column to compare and I
 would possibly be allocating a large unneeded byte array?

As said above, the remaing bytes won't (always) be the actual bytes.

 #Using the ByteBuffer’s ‘get’ function also updates the position of the
 ByteBuffer. Is the compare function expected to do that or should it reset
 the position back to what it was or …?

Neither. You should *not* use any function that change the ByteBuffer position.
That is, changing it and resetting it afterward is *not* ok.
Instead you should only use only the absolute get() methods, that do
not change the
position at all.
Or, you start your compare function by calling BB.duplicate() on both
buffers and
then you're free to change the position of the duplicates.

--
Sylvain


RE: Different result after restart

2011-04-20 Thread Desimpel, Ignace
I'm using the org.apache.cassandra.thrift.CassandraDeamon
implementation. I have done the same with version 0.6.x but now modified
the code for version 0.7.4. I could restart without problem in 0.6.x.

 

I get (did not add them all) the following messages :

(Keyspace is 'SearchSpace', CF names like 'ReverseStringValues'
'ReverseLabelValues' , Structure', ... )

 

2011-04-20 12:27:48 INFO  AbstractCassandraDaemon - Logging initialized

2011-04-20 12:27:48 INFO  AbstractCassandraDaemon - Heap size:
10719985664/10719985664

2011-04-20 12:27:48 WARN  CLibrary - Obsolete version of JNA present;
unable to register C library. Upgrade to JNA 3.2.7 or later

2011-04-20 12:27:48 INFO  DatabaseDescriptor - Loading settings from
file:C:/develop/configs/AnnotationServer7/properties/cassandra.yaml

2011-04-20 12:27:48 INFO  DatabaseDescriptor - DiskAccessMode 'auto'
determined to be mmap, indexAccessMode is mmap

...

2011-04-20 12:27:49 INFO  CommitLogSegment - Creating new commitlog
segment ../cassandra/dbcommitlog\CommitLog-1303295269011.log

2011-04-20 12:27:49 INFO  CommitLog - Replaying
..\cassandra\dbcommitlog\CommitLog-1303294884815.log

2011-04-20 12:27:54 INFO  CommitLog - Finished reading
..\cassandra\dbcommitlog\CommitLog-1303294884815.log

2011-04-20 12:27:54 INFO  ColumnFamilyStore - Enqueuing flush of
Memtable-ReverseStringValues@249550264(8519832 bytes, 189992 operations)

2011-04-20 12:27:54 INFO  Memtable - Writing
Memtable-ReverseStringValues@249550264(8519832 bytes, 189992 operations)

2011-04-20 12:27:54 INFO  ColumnFamilyStore - Enqueuing flush of
Memtable-ReverseLabelValues@1617914474(1339548 bytes, 31894 operations)

2011-04-20 12:27:54 INFO  Memtable - Writing
Memtable-ReverseLabelValues@1617914474(1339548 bytes, 31894 operations)

2011-04-20 12:27:55 INFO  ColumnFamilyStore - Enqueuing flush of
Memtable-ForwardLabelValues@1924550782(1339548 bytes, 31894 operations)

2011-04-20 12:27:55 INFO  Memtable - Writing
Memtable-ForwardLabelValues@1924550782(1339548 bytes, 31894 operations)

...

2011-04-20 12:27:58 INFO  CommitLog - Log replay complete



2011-04-20 12:27:57 INFO  CompactionManager - Compacting
[SSTableReader(path='..\cassandra\dbdatafile\SearchSpace\Structure-f-1-D
ata.db'),SSTableReader(path='..\cassandra\dbdatafile\SearchSpace\Structu
re-f-2-Data.db')]

...

2011-04-20 12:27:57 INFO  ColumnFamilyStore - Enqueuing flush of
Memtable-ReverseDoubleValues@1985313813(56946 bytes, 1265 operations)

2011-04-20 12:27:57 INFO  Memtable - Writing
Memtable-ReverseDoubleValues@1985313813(56946 bytes, 1265 operations)

2011-04-20 12:27:57 INFO  ColumnFamilyStore - Enqueuing flush of
Memtable-Documents@1715831652(1872209 bytes, 36 operations)

2011-04-20 12:27:57 INFO  Memtable - Writing
Memtable-Documents@1715831652(1872209 bytes, 36 operations)

 

2011-04-20 12:27:59 INFO  CompactionManager - Compacting
[SSTableReader(path='..\cassandra\dbdatafile\system\LocationInfo-f-1-Dat
a.db'),SSTableReader(path='..\cassandra\dbdatafile\system\LocationInfo-f
-2-Data.db'),SSTableReader(path='..\cassandra\dbdatafile\system\Location
Info-f-3-Data.db'),SSTableReader(path='..\cassandra\dbdatafile\system\Lo
cationInfo-f-4-Data.db')]

2011-04-20 12:27:59 INFO  Mx4jTool - Will not load MX4J, mx4j-tools.jar
is not in the classpath

2011-04-20 12:27:59 INFO  CassandraDaemon - Binding thrift service to
GH-DSK0178.nuance.com/10.184.56.115:9160

2011-04-20 12:27:59 INFO  CassandraDaemon - Listening for thrift
clients...

2011-04-20 12:27:59 INFO  CompactionManager - Compacted to
..\cassandra\dbdatafile\system\LocationInfo-tmp-f-5-Data.db.  751 to 457
(~60% of original) bytes for 3 keys.  Time: 370ms.

2011-04-20 12:28:02 INFO  SearchServer - Annotation labels present.
Count labels : 568

 

I do not get any message saying 'Discarding obsolete ..'. The replayed
commit file however is deleted. The only one left is the new
CommitLog-1303295269011. But I thought that is normal at restart.

Some data is still there and could be queried and that is why there is
my own message saying '..Annotation labels present. Count labels :
568...' 

The column family I wanted to access in my query is ForwardLabelValues
(also present in the log extraction here). And the size of the file on
disk is 3.5 m bytes. Also the query I specify is one that should get all
the 'records'.

 

I did try the sstable2json but must be doing something wrong. I got :

 

sstable2json
C:\develop\configs\AnnotationServer7\cassandra\dbdatafile\SearchSpace\Fo
rwardLabelValues-f-1-Data.db

no non-system tables are defined

Exception in thread main
org.apache.cassandra.config.ConfigurationException: no non-system tables
are defined

at
org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:457)

 

Thanks,

Ignace

 

From: aaron morton [mailto:aa...@thelastpickle.com] 
Sent: Wednesday, April 20, 2011 1:40 PM
To: user@cassandra.apache.org
Subject: Re: Different result after restart

 

Checking the simple things first, are you using the

RE: Question about AbstractType class

2011-04-20 Thread Desimpel, Ignace


-Original Message-
From: Sylvain Lebresne [mailto:sylv...@datastax.com] 
Sent: Wednesday, April 20, 2011 2:07 PM
To: user@cassandra.apache.org
Subject: Re: Question about AbstractType class

On Wed, Apr 20, 2011 at 1:35 PM, Desimpel, Ignace ignace.desim...@nuance.com 
wrote:
 Cassandra version 0.7.4



 Hi,



 I created my own java class as an extension of the AbstractType class. 
 But I'm not sure about the following items related to the compare function :

 # The remaining bytes of the buffer sometimes is zero during thrift 
 get_slice execution, however I never store any zero length column name 
 nor query for it . If normal, what would be the correct handling of 
 the zero remaining bytes?

It is normal, the empty ByteBuffer is used in slice queries to indicate the 
beginning of the row (start=). More generally, compare and validate should 
work for anything you store but also anything you provide for the 'start'
and 'end' argument of slices.

 Would it be something like :

 public int compare(ByteBuffer o1, ByteBuffer o2){ int ar1Rem = 
 o1.remaining(); int ar2Rem = o2.remaining(); if ( ar1Rem == 0 || 
 ar2Rem == 0 ) { if ( ar1Rem != 0 ) {
          return 1;
       } else if ( ar2Rem != 0 ) {
          return -1;
       } else {
          return 0;
       }
 }
 //Add the real compare here
 ...}

That looks reasonable (though not optimal in the number of comparison :))
-OK

 # Since in version 0.6.3 the same function was passing an array of 
 bytes, I assumed that I could now call the ByteBuffer.array() function 
 in order to get the array of bytes backing up the ByteBuffer.

It's not that simple. First, even if you use ByteBuffer.array(), you'll have to 
be careful that the ByteBuffer has a position, a limit and an arrayOffset and 
you should take that into account when accessing the backing array. But there 
is also no guarantee that the ByteBuffer will have a backing array so you need 
to handle this case too (I refer you to the ByteBuffer documentation).
-OK

 Also the length of the
 byte array in 0.6.3 seemed always to correspond to the bytes of column 
 name stored. But now in version 0.7.4 that ByteBuffer is not always 
 backed by such an array.

 I can still get around this by making the needed buffer myself like :

 int ar2Rem = o2.remaining();
 byte[] ar2 = new byte[ar2Rem];
 o2.get(ar2, 0, ar2Rem);

 Question is : Are the remaining bytes the actual bytes for this column 
 name
 (eg: 20 bytes) or would that ByteBuffer ever be some wrapper around 
 some larger stream of data and the remaining bytes number could be 10 M bytes.
 Thus I would not be able to detect the end of the column to compare 
 and I would possibly be allocating a large unneeded byte array?

As said above, the remaing bytes won't (always) be the actual bytes.
-Then how do I know the end is near? Eg.:  If the stored value is a char 
string, it would be nice to know the end. Unless I also store it before the 
char string.
-Assuming that both ByteBuffers have the same data and the same position and 
limit, thus same remaining, one can imagine a loop comparing each byte until 
the remaining is used up. Thus then I can not get any more data and thus I 
should return 0?

 #Using the ByteBuffer's 'get' function also updates the position of 
 the ByteBuffer. Is the compare function expected to do that or should 
 it reset the position back to what it was or ...?

Neither. You should *not* use any function that change the ByteBuffer position.
That is, changing it and resetting it afterward is *not* ok.
-OK
Instead you should only use only the absolute get() methods, that do not change 
the position at all.
Or, you start your compare function by calling BB.duplicate() on both buffers 
and then you're free to change the position of the duplicates.
-OK

--
Sylvain

Thanks Sylvain!


RE: Different result after restart

2011-04-20 Thread Desimpel, Ignace
Aaron,

 

Already found out what the problem was. I was using an AbstractType
comparator for a column family. That code was changing the given
ByteBuffer position and was not supposed to do that (Hinted by Sylvain
Lebresne !).

Anyway, after correcting that problem I got back the results as before.
Still don't grasp how this relates to the restart of the server, but
I'am happy as is.

 

Thanks very much Aaron!

 

Ignace

 

 

From: aaron morton [mailto:aa...@thelastpickle.com] 
Sent: Wednesday, April 20, 2011 1:40 PM
To: user@cassandra.apache.org
Subject: Re: Different result after restart

 

Checking the simple things first, are you using the
o.a.c.service.EmbeddedCassandraService or  the o.a.c.EmbeddedServer in
the unit test directory ? The later deletes data, but it does not sound
like you are using it.

 

When the server starts it should read any commit logs, roll them forward
and then flush all the changes to SSTables. Which will result in the log
files been deleted from disk, and you should see INFO level log messages
that say Discarding obsolete commit log:...

 

Do you get new SSTables written at start up ? 

 

If you wanted to confirm the data was there take a look at
bin/sstable2json 

 

Hope that helps. 

Aaron

 

 

 

On 20 Apr 2011, at 23:00, Desimpel, Ignace wrote:





Cassandra version 0.7.4

 

Hi,

 

I'm storing (no deletion) in a small test some records to an embedded
Cassandra instance.

Then I connect using Thrift and I can retrieve the data as excepted.

 

Then I restart the server with the embedded Cassandra, reconnect using
Thrift but now the same query gives me no results at all.

After restart the commitlog directory get cleared leaving only a small
log and a small log.header file. The data directory for the keyspace is
still present together with the db files corresponding the column
families.

Any idea what I would be doing wrong here?

 

 

Ignace

 



Re: Question about AbstractType class

2011-04-20 Thread Sylvain Lebresne
On Wed, Apr 20, 2011 at 3:06 PM, Desimpel, Ignace
ignace.desim...@nuance.com wrote:
 As said above, the remaing bytes won't (always) be the actual bytes.

Sorry I answered a bit quickly, I meant to say that the actual bytes
won't (always) be the full backing array.
That is, we never guarantee that BB.arrayOffset() == 0, nor
BB.position() == 0, nor BB.limit() == backingArray.length.
But the remaining() bytes will be the actual bytes, my bad.

--
Sylvain


RE: Question about AbstractType class

2011-04-20 Thread Desimpel, Ignace
Thanks Sylvain. Your answer already helped me out a lot! I was using a
ByteBuffer.get function that is changing the ByteBuffer's position. And
I got all kinds of stranges effects and exceptions I didn't get in
0.6.x. Changed that code and all problems are gone...

Many thanks!!
Ignace

-Original Message-
From: Sylvain Lebresne [mailto:sylv...@datastax.com] 
Sent: Wednesday, April 20, 2011 4:04 PM
To: user@cassandra.apache.org
Subject: Re: Question about AbstractType class

On Wed, Apr 20, 2011 at 3:06 PM, Desimpel, Ignace
ignace.desim...@nuance.com wrote:
 As said above, the remaing bytes won't (always) be the actual bytes.

Sorry I answered a bit quickly, I meant to say that the actual bytes
won't (always) be the full backing array.
That is, we never guarantee that BB.arrayOffset() == 0, nor
BB.position() == 0, nor BB.limit() == backingArray.length.
But the remaining() bytes will be the actual bytes, my bad.

--
Sylvain


NotSerializableException of FutureTask

2011-04-20 Thread Desimpel, Ignace
Using own JMX java code and when using the NodeTool I get the following
exception when calling the forceFlush function.

But it seems that the flushing itself is started although the exception
occurred.

Any idea?

(running jdk 1.6, 64 bits)

 

Ignace

 

2011-04-20 16:23:45 INFO  ColumnFamilyStore - Enqueuing flush of
Memtable-ReverseIntegerValues@75939304(2274472 bytes, 48892 operations)

2011-04-20 16:23:45 INFO  Memtable - Writing
Memtable-ReverseIntegerValues@75939304(2274472 bytes, 48892 operations)

java.rmi.UnmarshalException: error unmarshalling return; nested
exception is: 

  java.io.WriteAbortedException: writing aborted;
java.io.NotSerializableException: java.util.concurrent.FutureTask

  at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:173)

  at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source)

  at
javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown
Source)

  at
javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.inv
oke(RMIConnector.java:993)

  at
javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocati
onHandler.java:288)

  at $Proxy7.forceFlush(Unknown Source)

  at
be.landc.services.search.server.db.indexsearch.store.cassandra.Cassandra
Store$CassNodeProbe.doRepairAll(CassandraStore.java:160)

  at
be.landc.services.search.server.db.indexsearch.store.cassandra.Cassandra
Store$CassNodeProbe.run(CassandraStore.java:141)

Caused by: java.io.WriteAbortedException: writing aborted;
java.io.NotSerializableException: java.util.concurrent.FutureTask

  at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1332)

  at
java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)

  at sun.rmi.server.UnicastRef.unmarshalValue(UnicastRef.java:306)

  at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:155)

  ... 7 more

Caused by: java.io.NotSerializableException:
java.util.concurrent.FutureTask

  at
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1164)

  at
java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:330)

  at sun.rmi.server.UnicastRef.marshalValue(UnicastRef.java:274)

  at
sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:315)

  at sun.rmi.transport.Transport$1.run(Transport.java:159)

  at java.security.AccessController.doPrivileged(Native Method)

  at sun.rmi.transport.Transport.serviceCall(Transport.java:155)

  at
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)

  at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.j
ava:790)

  at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.ja
va:649)

  at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecuto
r.java:886)

  at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
va:908)

  at java.lang.Thread.run(Thread.java:662)

2011-04-20 16:23:45 INFO  SearchServer - == Starting flush for column
family : SearchSpace / ForwardLongValues

2011-04-20 16:23:45 INFO  ColumnFamilyStore - Enqueuing flush of
Memtable-ForwardLongValues@710396564(26780468 bytes, 623958 operations)



Re: cluster IP question and Jconsole?

2011-04-20 Thread tinhuty he

Maki,

Yes you are right, 8081 is mx4j port, the JMX_PORT is 8001 in the 
cassandra-env.sh.


in the cassandra Linux server itself, I can run this successfully:
nodetool -host x -p 8001 ring
x is the actually IP address

however when I run the same command in another windows machine(which has the 
cassandra windows version extracted), I am getting exception like below, one 
thing puzzled me is that the command trying to connect to ip x, but the 
exception claimed: Connection refused to host: 127.0.0.1. Is there 
anything else that I need to config or...? I guess this is probably the 
reason that jconsole can't connect to port 8001 remotely either? Thanks for 
any advice!


D:\apache-cassandra-0.7.4\binnodetool -host x -p 8001 ring
Starting NodeTool
Error connection to remote JMX agent!
java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested 
exception is:

   java.net.ConnectException: Connection refused: connect
   at sun.rmi.transport.tcp.TCPEndpoint.newSocket(Unknown Source)
   at sun.rmi.transport.tcp.TCPChannel.createConnection(Unknown Source)
   at sun.rmi.transport.tcp.TCPChannel.newConnection(Unknown Source)
   at sun.rmi.server.UnicastRef.invoke(Unknown Source)
   at javax.management.remote.rmi.RMIServerImpl_Stub.newClient(Unknown 
Source)
   at javax.management.remote.rmi.RMIConnector.getConnection(Unknown 
Source)

   at javax.management.remote.rmi.RMIConnector.connect(Unknown Source)
   at javax.management.remote.JMXConnectorFactory.connect(Unknown 
Source)

   at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:137)
   at org.apache.cassandra.tools.NodeProbe.init(NodeProbe.java:107)
   at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:511)
Caused by: java.net.ConnectException: Connection refused: connect
   at java.net.PlainSocketImpl.socketConnect(Native Method)
   at java.net.PlainSocketImpl.doConnect(Unknown Source)
   at java.net.PlainSocketImpl.connectToAddress(Unknown Source)
   at java.net.PlainSocketImpl.connect(Unknown Source)
   at java.net.SocksSocketImpl.connect(Unknown Source)
   at java.net.Socket.connect(Unknown Source)
   at java.net.Socket.connect(Unknown Source)
   at java.net.Socket.init(Unknown Source)
   at java.net.Socket.init(Unknown Source)
   at 
sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(Unknown Source)
   at 
sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(Unknown Source)

   ... 11 more



-Original Message- 
From: Watanabe Maki

Sent: Saturday, April 16, 2011 1:45 AM
To: user@cassandra.apache.org
Cc: user@cassandra.apache.org
Subject: Re: cluster IP question and Jconsole?

8081 is your mx4j port, isn't it? You need to connect jconsole to JMX_PORT 
specified in cassandra-env.sh.


maki


From iPhone



On 2011/04/16, at 13:56, tinhuty he tinh...@hotmail.com wrote:

Maki, thanks for your reply. for the second question, I wasn't using the 
loopback address, I was using the actually IP address for that server. I 
am able to telnet to that IP on port 8081, but using jconsole failed.


-Original Message- From: Maki Watanabe
Sent: Friday, April 15, 2011 9:43 PM
To: user@cassandra.apache.org
Cc: tinhuty he
Subject: Re: cluster IP question and Jconsole?

127.0.0.2 to 127.0.0.5 are valid IP addresses. Those are just alias
addresses for your loopback interface.
Verify:
% ifconfig -a

127.0.0.0/8 is for loopback, so you can't connect this address from
remote machines.
You may be able configure SSH port forwarding from your monitroing
host to cassandra node though I haven't try.

maki

2011/4/16 tinhuty he tinh...@hotmail.com:

I have followed the description here
http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/lauching_5_node_cassandra_clusters
to created 5 instances of cassandra in one CentOS 5.5 machine. using
nodetool shows the 5 nodes are all running fine.

Note the 5 nodes are using IP 127.0.0.1 to 127.0.0.5. I understand 
127.0.0.1
is pointing to local server, but how about 127.0.0.2 to 127.0.0.5? looks 
to

me that they are not valid IP? how come all 5 nodes are working ok?

Another question. I have installed MX4J in instance 127.0.0.1 on port 
8081.
I am able to connect to http://server:8081/ from the browser. However how 
do

I connect using Jconsole that was installed in another windows
machines?(since my CentOS5.5 doesn't have X installed, only SSH allowed).

Thanks.




Re: Ec2Snitch + NetworkTopologyStrategy if only in one region?

2011-04-20 Thread William Oberman
Also for the new users like me, don't assume DC1 is a keyword like I did.  A
working example of a keyspace in EC2 is:

create keyspace test with replication_factor=3 and strategy_options =
[{us-east:3}] and
placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy';

For a single DC in EC2 deployment.  I felt silly afterwards, but I couldn't
find official docs on the structure of strategy_options anywhere.

will

On Wed, Apr 13, 2011 at 5:14 PM, William Oberman
ober...@civicscience.comwrote:

 One last coda, for other noobs to cassandra like me.  If you use
 NetworkTopologyStrategy with replication_factor  1, make sure you have EC2
 instance in multiple availability zones.  I was doing baby steps, and tried
 doing a cluster in one AZ (before spreading to multiple AZs) and was getting
 the most baffling errors (cassandra_UnavailableException).  I finally
 thought to check the cassandra server logs (after debugging the client code,
 firewalls, etc... painstakingly for connectivity problems), and it ends up
 my cassandra cluster was considering itself unavailable as it couldn't
 replicate as much as it wanted to.  I kind of wish a different word than
 unavailable was chosen for this error condition :-)

 will


 On Tue, Apr 12, 2011 at 10:37 PM, aaron morton aa...@thelastpickle.comwrote:

 If you can use standard + encoded I would go with that.

 Aaron

 On 13 Apr 2011, at 07:07, William Oberman wrote:

 Excellent to know! (and yes, I figure I'll expand someday, so I'm glad I
 found this out before digging a hole).

 The other issue I've been pondering is a normal column family of encoded
 objects (in my case JSON) vs. a super column.  Based on my use case, things
 I've read, etc...  right now I'm coming down on normal + encoded.

 will

 On Tue, Apr 12, 2011 at 2:57 PM, Jonathan Ellis jbel...@gmail.comwrote:

 NTS is overkill in the sense that it doesn't really benefit you in a
 single DC, but if you think you may expand to another DC in the future
 it's much simpler if you were already using NTS, than first migrating
 to NTS (changing strategy is painful).

 I can't think of any downsides to using NTS in a single-DC
 environment, so that's the safe option.

 On Tue, Apr 12, 2011 at 1:15 PM, William Oberman
 ober...@civicscience.com wrote:
  Hi,
 
  I'm getting closer to commiting to cassandra, and now I'm in system/IT
  issues and questions.  I'm in the amazon EC2 cloud.  I previously used
 this
  forum to discover the best practice for disk layouts (large instance +
 the
  two ephemeral disks in RAID0 for data + root volume for everything
 else).
  Now I'm hoping to confirm bits and pieces of things I've read about for
  snitch/replication strategies.  I was thinking of using
  endpoint_snitch: org.apache.cassandra.locator.Ec2Snitch
 
 placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy'
  (for people hitting this from the mailing list or google, I feel
 obligated
  to note that the former setting is in cassandra.yaml, and the latter is
 an
  option on a keyspace).
 
  But, I'm only in one region. Is using the amazon snitch/networktopology
  overkill given everything I have is in one DC (I believe region==DC and
  availability_zone==rack).  I'm using multiple availability zones for
 some
  level of redundancy, I'm just not yet to the point I'm using multiple
  regions.  If someday I move to using multiple regions, would that
 change the
  answer?
 
  Thanks!
 
  --
  Will Oberman
  Civic Science, Inc.
  3030 Penn Avenue., First Floor
  Pittsburgh, PA 15201
  (M) 412-480-7835
  (E) ober...@civicscience.com
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




 --
 Will Oberman
 Civic Science, Inc.
 3030 Penn Avenue., First Floor
 Pittsburgh, PA 15201
 (M) 412-480-7835
 (E) ober...@civicscience.com





 --
 Will Oberman
 Civic Science, Inc.
 3030 Penn Avenue., First Floor
 Pittsburgh, PA 15201
 (M) 412-480-7835
 (E) ober...@civicscience.com




-- 
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) ober...@civicscience.com


Re: NotSerializableException of FutureTask

2011-04-20 Thread Jonathan Ellis
You must be using an old Cassandra and/or nodetool; current nodetool calls
forceBlockingFlush which does not try to return a Future over JMX.

On Wed, Apr 20, 2011 at 9:38 AM, Desimpel, Ignace
ignace.desim...@nuance.com wrote:
 Using own JMX java code and when using the NodeTool I get the following
 exception when calling the forceFlush function.

 But it seems that the flushing itself is started although the exception
 occurred.

 Any idea?

 (running jdk 1.6, 64 bits)



 Ignace



 2011-04-20 16:23:45 INFO  ColumnFamilyStore - Enqueuing flush of
 Memtable-ReverseIntegerValues@75939304(2274472 bytes, 48892 operations)

 2011-04-20 16:23:45 INFO  Memtable - Writing
 Memtable-ReverseIntegerValues@75939304(2274472 bytes, 48892 operations)

 java.rmi.UnmarshalException: error unmarshalling return; nested exception
 is:

   java.io.WriteAbortedException: writing aborted;
 java.io.NotSerializableException: java.util.concurrent.FutureTask

   at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:173)

   at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source)

   at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown
 Source)

   at
 javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:993)

   at
 javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:288)

   at $Proxy7.forceFlush(Unknown Source)

   at
 be.landc.services.search.server.db.indexsearch.store.cassandra.CassandraStore$CassNodeProbe.doRepairAll(CassandraStore.java:160)

   at
 be.landc.services.search.server.db.indexsearch.store.cassandra.CassandraStore$CassNodeProbe.run(CassandraStore.java:141)

 Caused by: java.io.WriteAbortedException: writing aborted;
 java.io.NotSerializableException: java.util.concurrent.FutureTask

   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1332)

   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)

   at sun.rmi.server.UnicastRef.unmarshalValue(UnicastRef.java:306)

   at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:155)

   ... 7 more

 Caused by: java.io.NotSerializableException: java.util.concurrent.FutureTask

   at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1164)

   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:330)

   at sun.rmi.server.UnicastRef.marshalValue(UnicastRef.java:274)

   at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:315)

   at sun.rmi.transport.Transport$1.run(Transport.java:159)

   at java.security.AccessController.doPrivileged(Native Method)

   at sun.rmi.transport.Transport.serviceCall(Transport.java:155)

   at
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)

   at
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)

   at
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)

   at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

   at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

   at java.lang.Thread.run(Thread.java:662)

 2011-04-20 16:23:45 INFO  SearchServer - == Starting flush for column
 family : SearchSpace / ForwardLongValues

 2011-04-20 16:23:45 INFO  ColumnFamilyStore - Enqueuing flush of
 Memtable-ForwardLongValues@710396564(26780468 bytes, 623958 operations)



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: cluster IP question and Jconsole?

2011-04-20 Thread Tyler Hobbs
See the first entry in http://wiki.apache.org/cassandra/JmxGotchas

On Wed, Apr 20, 2011 at 9:54 AM, tinhuty he tinh...@hotmail.com wrote:

 Maki,

 Yes you are right, 8081 is mx4j port, the JMX_PORT is 8001 in the
 cassandra-env.sh.

 in the cassandra Linux server itself, I can run this successfully:
 nodetool -host x -p 8001 ring
 x is the actually IP address

 however when I run the same command in another windows machine(which has
 the cassandra windows version extracted), I am getting exception like below,
 one thing puzzled me is that the command trying to connect to ip x, but
 the exception claimed: Connection refused to host: 127.0.0.1. Is there
 anything else that I need to config or...? I guess this is probably the
 reason that jconsole can't connect to port 8001 remotely either? Thanks for
 any advice!

 D:\apache-cassandra-0.7.4\binnodetool -host x -p 8001 ring
 Starting NodeTool
 Error connection to remote JMX agent!
 java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested
 exception is:
   java.net.ConnectException: Connection refused: connect
   at sun.rmi.transport.tcp.TCPEndpoint.newSocket(Unknown Source)
   at sun.rmi.transport.tcp.TCPChannel.createConnection(Unknown Source)
   at sun.rmi.transport.tcp.TCPChannel.newConnection(Unknown Source)
   at sun.rmi.server.UnicastRef.invoke(Unknown Source)
   at javax.management.remote.rmi.RMIServerImpl_Stub.newClient(Unknown
 Source)
   at javax.management.remote.rmi.RMIConnector.getConnection(Unknown
 Source)
   at javax.management.remote.rmi.RMIConnector.connect(Unknown Source)
   at javax.management.remote.JMXConnectorFactory.connect(Unknown
 Source)
   at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:137)
   at org.apache.cassandra.tools.NodeProbe.init(NodeProbe.java:107)
   at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:511)
 Caused by: java.net.ConnectException: Connection refused: connect
   at java.net.PlainSocketImpl.socketConnect(Native Method)
   at java.net.PlainSocketImpl.doConnect(Unknown Source)
   at java.net.PlainSocketImpl.connectToAddress(Unknown Source)
   at java.net.PlainSocketImpl.connect(Unknown Source)
   at java.net.SocksSocketImpl.connect(Unknown Source)
   at java.net.Socket.connect(Unknown Source)
   at java.net.Socket.connect(Unknown Source)
   at java.net.Socket.init(Unknown Source)
   at java.net.Socket.init(Unknown Source)
   at
 sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(Unknown Source)
   at
 sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(Unknown Source)
   ... 11 more



 -Original Message- From: Watanabe Maki
 Sent: Saturday, April 16, 2011 1:45 AM

 To: user@cassandra.apache.org
 Cc: user@cassandra.apache.org

 Subject: Re: cluster IP question and Jconsole?

 8081 is your mx4j port, isn't it? You need to connect jconsole to JMX_PORT
 specified in cassandra-env.sh.

 maki

 From iPhone


 On 2011/04/16, at 13:56, tinhuty he tinh...@hotmail.com wrote:

  Maki, thanks for your reply. for the second question, I wasn't using the
 loopback address, I was using the actually IP address for that server. I am
 able to telnet to that IP on port 8081, but using jconsole failed.

 -Original Message- From: Maki Watanabe
 Sent: Friday, April 15, 2011 9:43 PM
 To: user@cassandra.apache.org
 Cc: tinhuty he
 Subject: Re: cluster IP question and Jconsole?

 127.0.0.2 to 127.0.0.5 are valid IP addresses. Those are just alias
 addresses for your loopback interface.
 Verify:
 % ifconfig -a

 127.0.0.0/8 is for loopback, so you can't connect this address from
 remote machines.
 You may be able configure SSH port forwarding from your monitroing
 host to cassandra node though I haven't try.

 maki

 2011/4/16 tinhuty he tinh...@hotmail.com:

 I have followed the description here

 http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/lauching_5_node_cassandra_clusters
 to created 5 instances of cassandra in one CentOS 5.5 machine. using
 nodetool shows the 5 nodes are all running fine.

 Note the 5 nodes are using IP 127.0.0.1 to 127.0.0.5. I understand
 127.0.0.1
 is pointing to local server, but how about 127.0.0.2 to 127.0.0.5? looks
 to
 me that they are not valid IP? how come all 5 nodes are working ok?

 Another question. I have installed MX4J in instance 127.0.0.1 on port
 8081.
 I am able to connect to http://server:8081/ from the browser. However
 how do
 I connect using Jconsole that was installed in another windows
 machines?(since my CentOS5.5 doesn't have X installed, only SSH allowed).

 Thanks.





-- 
Tyler Hobbs
Software Engineer, DataStax http://datastax.com/
Maintainer of the pycassa http://github.com/pycassa/pycassa Cassandra
Python client library


Ec2 Stress Results

2011-04-20 Thread Alex Araujo
Does anyone have any Ec2 benchmarks/experiences they can share?  I am 
trying to get a sense for what to expect from a production cluster on 
Ec2 so that I can compare my application's performance against a sane 
baseline.  What I have done so far is:


1. Lunched a 4 node cluster of m1.xlarge instances in the same 
availability zone using PyStratus 
(https://github.com/digitalreasoning/PyStratus).  Each node has the 
following specs (according to Amazon):

15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage
64-bit platform

2. Changed the default PyStratus directories in order to have commit 
logs on the root partition and data files on ephemeral storage:

commitlog_directory: /var/cassandra-logs
data_file_directories: [/mnt/cassandra-data]

2. Gave each node 10GB of MAX_HEAP; 1GB HEAP_NEWSIZE in 
conf/cassandra-env.sh


3. Ran `contrib/stress/bin/stress -d node1,..,node4 -n 1000 -t 100` 
on a separate m1.large instance:

total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
...
9832712,7120,7120,0.004948514851485148,842
9907616,7490,7490,0.0043189949802413755,852
9978357,7074,7074,0.004560353967289125,863
1000,2164,2164,0.004065933558194335,867

4. Truncated Keyspace1.Standard1:
# /usr/local/apache-cassandra/bin/cassandra-cli -host localhost -port 9160
Connected to: Test Cluster on x.x.x.x/9160
Welcome to cassandra CLI.

Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.
[default@unknown] use Keyspace1;
Authenticated to keyspace: Keyspace1
[default@Keyspace1] truncate Standard1;
null

5. Expanded the cluster to 8 nodes using PyStratus and sanity checked 
using nodetool:

# /usr/local/apache-cassandra/bin/nodetool -h localhost ring
Address Status State   LoadOwnsToken
x.x.x.x  Up Normal  1.3 GB  12.50%  
21267647932558653966460912964485513216
x.x.x.x   Up Normal  3.06 GB 12.50%  
42535295865117307932921825928971026432
x.x.x.x Up Normal  1.16 GB 12.50%  
63802943797675961899382738893456539648
x.x.x.x   Up Normal  2.43 GB 12.50%  
85070591730234615865843651857942052864
x.x.x.x   Up Normal  1.22 GB 12.50%  
106338239662793269832304564822427566080
x.x.x.xUp Normal  2.74 GB 12.50%  
127605887595351923798765477786913079296
x.x.x.xUp Normal  1.22 GB 12.50%  
148873535527910577765226390751398592512
x.x.x.x   Up Normal  2.57 GB 12.50%  
170141183460469231731687303715884105728


6. Ran `contrib/stress/bin/stress -d node1,..,node8 -n 1000 -t 100` 
on a separate m1.large instance again:

total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
...
9880360,9649,9649,0.003210443956226165,720
9942718,6235,6235,0.003206934154398794,731
9997035,5431,5431,0.0032615939761032457,741
1000,296,296,0.002660033726812816,742

In a nutshell, 4 nodes inserted at 11,534 writes/sec and 8 nodes 
inserted at 13,477 writes/sec.


Those numbers seem a little low to me, but I don't have anything to 
compare to.  I'd like to hear others' opinions before I spin my wheels 
with with number of nodes, threads,  memtable, memory, and/or GC 
settings.  Cheers, Alex.


system_* consistency level?

2011-04-20 Thread William Oberman
Hi,

My unit tests started failing once I upgraded from a single node cassandra
cluster to a full N node cluster (I'm starting with 4).  I had a few
various bugs, mostly due to forgetting to read/write at a quorum level in
places I needed stronger consistency guarantees.  But, I kept getting
random, intermittent failure (the worst kind).  I'm 99% sure I see why,
after some painful debugging, but I don't know what to do about it.  The
basic flaw in my understanding of cassandra seems to boil down to: I thought
system mutations of keyspaces/column families where of a stronger
consistency than ONE, but that appears to not be true.  Any way for me to
update a cluster at something more like QUORUM?

The basic idea is in my unit test.setup() I clone my real keyspace as
keyspace_UUID (with all of the exact same CFs) to get a fresh space to play
in.  In a single node environment, no issues.  But, in a cluster, it seems
that it takes a while for the system_add_keyspace call to propagate.  No
worries I think, I just modify my setup() to do
describe_keyspace(keyspace_UUID) in a while loop until the cluster is
ready.  My random failures drop considerably, but every once and awhile I
see a similar kind of failure.  Then I find out that schema updates seem to
propagate on a per node basis.  At least, that's what I have to assume as
I'm using phpcassa which uses a connection pool, and I see in my logging
that my setup() succeeds because one connection in the pool sees the new
keyspace, but when my tests run I grab a connection from the pool that is
missing it!

Do I have a solution other than changing my setup yet again to loop over all
cassandra servers doing a describe_keyspace()?

-- 
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) ober...@civicscience.com


Re: Internal error processing get_range_slices

2011-04-20 Thread Jonathan Ellis
internal error means an error on the server. check the server log for the
stacktrace.

On Wed, Apr 20, 2011 at 11:54 AM, Renato Bacelar da Silveira 
renat...@indabamobile.co.za wrote:

  Hi all

 I am just augmenting the information on the following error:

 -- error --

 org.apache.thrift.TApplicationException: Internal error processing
 get_range_slices
 at
 org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
 at
 org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:724)
 at
 org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704)
 at
 com.indaba.cassandra.thrift.ThriftManager.multigetSliceAcrossAllUsers(ThriftManager.java:180)
 at
 com.indaba.cassandra.thrift.ThriftManager.testMultigetSlice(ThriftManager.java:210)
 at
 com.indaba.cassandra.thrift.ThriftManager.main(ThriftManager.java:260)
 -- error --

 Was able to access the method that sends the cassandra query call,
 and I do not see anything different then to what I have specified,
 as in any values changed, missing or such...

 Basically I have a range slice query with and empty star_key and
 end_key keyrange, with a Integer.MAX_VALUE [return count value].

 I have also a SlicePredicate with 3 column names I would like to
 find within my column family. I also specify a column parent, but
 NO super column name, as to roll through the entire range of super columns.
 From the documentation I gathered one could leave that out from the Column
 Parent object path so to cause this multiget to work (should be getting a
 book),
 so I think I have it covered.

 Below is the code I am using, and just after, the variable values at
 the time when
 Cassandra.send_get_range_slices(ColumnParent column_parent,
 SlicePredicate predicate,
 KeyRange range,
 ConsistencyLevel consistency_level)

 is called:

 -- code --


 public MapString, ListColumnOrSuperColumn
 multigetSliceAcrossAllUsers(String[] colNames){
 ColumnParent cp;
 MapString, ListColumnOrSuperColumn slicemap = new
 TreeMapString, ListColumnOrSuperColumn();
 ListKeySlice lstKeyslice;
 ListByteBuffer lstColNames = new ArrayListByteBuffer();
 for(String s : colNames) {
 lstColNames.add(ByteBufferUtil.bytes(s));
 }
 try {
 ListCfDef lstColFamDef =
 client.describe_keyspace(getCurrentKeyspaceName()).getCf_defs();
 for(CfDef def : lstColFamDef) {
 cp = new ColumnParent();
 cp.setColumn_family(def.getName());
 SlicePredicate slicePrd = new SlicePredicate();
 slicePrd.setColumn_names(lstColNames);
 KeyRange kr = new KeyRange();
 kr.setCount(Integer.MAX_VALUE);
 kr.setStart_key(new byte[0]);
 kr.setEnd_key(new byte[0]);
 kr.setStart_keyIsSet(true);
 kr.setEnd_keyIsSet(true);
 try {
 lstKeyslice = client.get_range_slices(cp, slicePrd, kr,
 ConsistencyLevel.ANY);
 for(KeySlice kslc : lstKeyslice) {
 slicemap.put(new String(kslc.getKey()),
 kslc.getColumns());
 }
 } catch (UnavailableException e) {
 e.printStackTrace();
 } catch (TimedOutException e) {
 e.printStackTrace();
 }
 }
 } catch (NotFoundException e) {
 e.printStackTrace();
 } catch (InvalidRequestException e) {
 e.printStackTrace();
 } catch (TException e) {
 e.printStackTrace();
 }
 return slicemap;
 }


 -- code --


 -- arguments --

 argsCassandra$get_range_slices_args  (id=77)
 column_parentColumnParent  (id=46)
 column_familyUserKey_38 (id=97)
 super_columnnull
 consistency_levelConsistencyLevel  (id=58)
 nameANY (id=86)
 ordinal5
 value6
 predicateSlicePredicate  (id=54)
 column_namesArrayListE  (id=31)
 [0]HeapByteBuffer  (id=137)
 [1]HeapByteBuffer  (id=138)
 [2]HeapByteBuffer  (id=139)
 slice_rangenull
 rangeKeyRange  (id=56)
 __isset_bit_vectorBitSet  (id=88)
 count2147483647
 end_keyHeapByteBuffer  (id=90)
 address0
 bigEndiantrue
 capacity0
 hb (id=118)
 isReadOnlyfalse
 limit0
 mark-1
 nativeByteOrderfalse
 offset0
 position0
 end_token

Re: system_* consistency level?

2011-04-20 Thread Jonathan Ellis
See the comments for describe_schema_versions.

On Wed, Apr 20, 2011 at 4:59 PM, William Oberman
ober...@civicscience.com wrote:
 Hi,

 My unit tests started failing once I upgraded from a single node cassandra
 cluster to a full N node cluster (I'm starting with 4).  I had a few
 various bugs, mostly due to forgetting to read/write at a quorum level in
 places I needed stronger consistency guarantees.  But, I kept getting
 random, intermittent failure (the worst kind).  I'm 99% sure I see why,
 after some painful debugging, but I don't know what to do about it.  The
 basic flaw in my understanding of cassandra seems to boil down to: I thought
 system mutations of keyspaces/column families where of a stronger
 consistency than ONE, but that appears to not be true.  Any way for me to
 update a cluster at something more like QUORUM?

 The basic idea is in my unit test.setup() I clone my real keyspace as
 keyspace_UUID (with all of the exact same CFs) to get a fresh space to play
 in.  In a single node environment, no issues.  But, in a cluster, it seems
 that it takes a while for the system_add_keyspace call to propagate.  No
 worries I think, I just modify my setup() to do
 describe_keyspace(keyspace_UUID) in a while loop until the cluster is
 ready.  My random failures drop considerably, but every once and awhile I
 see a similar kind of failure.  Then I find out that schema updates seem to
 propagate on a per node basis.  At least, that's what I have to assume as
 I'm using phpcassa which uses a connection pool, and I see in my logging
 that my setup() succeeds because one connection in the pool sees the new
 keyspace, but when my tests run I grab a connection from the pool that is
 missing it!

 Do I have a solution other than changing my setup yet again to loop over all
 cassandra servers doing a describe_keyspace()?

 --
 Will Oberman
 Civic Science, Inc.
 3030 Penn Avenue., First Floor
 Pittsburgh, PA 15201
 (M) 412-480-7835
 (E) ober...@civicscience.com




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Cannot find row when using 3 indices for search, able to find it using only 2

2011-04-20 Thread Constantin Teodorescu
Cassandra 0.7.4 on 4 nodes Linux Ubuntu 10.10 i386 , 32 bit

root@bigcouch-106:/etc/cassandra# nodetool -h 172.16.1.106 ring
Address Status State   LoadOwnsToken

172.16.1.104Up Normal  1.8 GB  22.33%
 4778396862879243066278530647513341098
172.16.1.8   Up Normal  1.48 GB 28.12%
 52627163731801348483758292043565262417
172.16.1.106Up Normal  1.21 GB 27.22%
 98934176951395683802275136006692518904
172.16.1.110Up Normal  1.12 GB 22.33%
 136934291168078629024171054299313117062

I am using keyspace 'bnd' , columnfamily 'pet' described as

update column family pet with column_metadata = [
  {column_name: P_cui,  validation_class:UTF8Type, index_type:
KEYS},
  {column_name: P_nume,  validation_class:UTF8Type, index_type: KEYS},
  {column_name: P_prenume, validation_class:UTF8Type, index_type: KEYS}
];

Trying to find a row using 2 indices (P_cui and P_prenume) works:
[default@bnd] get pet where P_cui='1670518330770' and
P_prenume='CONSTANTIN';
---
RowKey: RO1492360605
= (column=A1RO35486663, value=313a463a323030332d30342d30313a32333730,
timestamp=1303181522507175)
= (column=P_adresa, value=4c4954454e49, timestamp=1303181522507175)
= (column=P_cui, value=1670518330770, timestamp=1303181522507175)
= (column=P_nume, value=Manoliu, timestamp=1303181522507175)
= (column=P_prenume, value=CONSTANTIN, timestamp=1303181522507175)
= (column=P_tip, value=36, timestamp=1303253832349129)

1 Row Returned.

I am able to find it using the other 2 indices (P_prenume and P_nume) works
fine:
[default@bnd] get pet where P_prenume='CONSTANTIN' and P_nume='Manoliu';
---
RowKey: RO1492360605
= (column=A1RO35486663, value=313a463a323030332d30342d30313a32333730,
timestamp=1303181522507175)
= (column=P_adresa, value=4c4954454e49, timestamp=1303181522507175)
= (column=P_cui, value=1670518330770, timestamp=1303181522507175)
= (column=P_nume, value=Manoliu, timestamp=1303181522507175)
= (column=P_prenume, value=CONSTANTIN, timestamp=1303181522507175)
= (column=P_tip, value=36, timestamp=1303253832349129)

1 Row Returned.

--

Trying to find the same row using 3 indices not working:
[default@bnd] get pet where P_cui='1670518330770' and P_prenume='CONSTANTIN'
and P_nume='Manoliu';

0 Row Returned.

Any clues?
Teo


Re: Cannot find row when using 3 indices for search, able to find it using only 2

2011-04-20 Thread Constantin Teodorescu
Thank you, I'll wait for 0.7.5 distribution when it will be shipped to test
it again!
Up to now, I'm satisfied with cassandra, we are evaluating it for migrating
our PostgreSQL solution to a mixed [couchdb + bigcouch + cassandra]
architecture !
Best regards,
Teo

On Thu, Apr 21, 2011 at 1:15 AM, Jonathan Ellis jbel...@gmail.com wrote:

 sounds like https://issues.apache.org/jira/browse/CASSANDRA-2347


CQL in future 8.0 cassandra will work as I'm expecting ?

2011-04-20 Thread Constantin Teodorescu
My use case is as follows: we are using in 70% of the jobs information
retrieval using keys, column names and ranges and up to now, what we have
tested suits our need.
However, the rest of 30% of the jobs involve full sequential scan of all
records in the database.

I found some web pages describing the next good thing for cassandra 0.8
release, CQL, and I'm wondering: the CQL execution will involve separate
processes running simultaneously on all nodes in the cluster that will do
the filtering and pre-sorting phase on the local stored data (using
indexes when available) and then execute the merge phase on a single node
(that one that have received the request) ?

Best regards,
Teo


Re: Multi-DC Deployment

2011-04-20 Thread Terje Marthinussen
Assuming that you generally put an API on top of this, delivering to two or
more systems then boils down to a message queue issue or some similar
mechanism which handles secure delivery of messages. Maybe not trivial, but
there are many products that can help you with this, and it is a lot easier
to implement than a fully distributed storage system.

Yes, ideally Cassandra will not distribute corruption, but the reason you
pay up to have 2 fully redundant setups in 2 different datacenters is
because we do not live in an ideal world. Anyone having tested Cassandra
since 0.7.0 with any real data will be able to testify how well it can mess
things up.

This is not specific to Cassandra, in fact, I would argue thats this is in
the blood of any distributed system. You want them to distribute after all
and the tighter the coupling is between nodes, the better they distribute
bad stuff as well as good stuff.

There is a bigger risk for a complete failure with 2 tightly coupled
redundant systems than with 2 almost completely isolated ones. The logic
here is so simple it is really somewhat beyond discussion.

There are a few other advantages of isolating the systems. Especially in
terms of operation, 2 isolated systems would be much easier as you could
relatively risk fee try out a new cassandra in one datacenter or upgrade one
datacenter at a time if you needed major operational changes such as schema
changes or other large changes to the data.

I see the 2 copies in one datacenters + 1(or maybe 2) in another as a low
cost middleway between 2 full N+2 (RF=3) systems in both data centers.

That is, in a traditional design where you need 1 node for normal service,
you would have 1 extra replicate for redundancy and one replica more (N+2
redundancy) so you can do maintenance and still be redundant.

If I have redundancy across datacenters, I would probably still want 2
replicas to avoid network traffic between DCs in case of a node recovery,
but N+2 may not be needed as my risk policy may find it acceptable to run
one datacenters without redundancy for a time limited period for
maintenance.

That is, if my original requirement is 1 node, I could do with 3x the HW
which is not all that much more than the 3x I need for one DC and a lot less
than the 6x I need for 2 full N+2 systems.

However, all of the above is really beyond the point of my original
suggestion.

Regardless of datacenters, redundancy and distribution of bad or good stuff,
it would be good to have a way to return whatever data is there, but with a
flag or similar stating that the consistency level was not met.

Again, for a lot of services, it is fully acceptable, and a lot better, to
return an almost complete (or maybe even complete, but no verified by
quorum) result than no result at all.

As far as I remember from the code, this just boils down to returning
whatever you collected from the cluster and setting the proper flag or
similar on the resultset rather than returning an error.

Terje

On Thu, Apr 21, 2011 at 5:01 AM, Adrian Cockcroft 
adrian.cockcr...@gmail.com wrote:

 Hi Terje,

 If you feed data to two rings, you will get inconsistency drift as an
 update to one succeeds and to the other fails from time to time. You
 would have to build your own read repair. This all starts to look like
 I don't trust Cassandra code to work, so I will write my own buggy
 one off versions of Cassandra functionality. I lean towards using
 Cassandra features rather than rolling my own because there is a large
 community testing, fixing and extending Cassandra, and making sure
 that the algorithms are robust. Distributed systems are very hard to
 get right, I trust lots of users and eyeballs on the code more than
 even the best engineer working alone.

 Cassandra doesn't replicate sstable corruptions. It detects corrupt
 data and only replicates good data. Also data isn't replicated to
 three identical nodes in the way you imply, it's replicated around the
 ring. If you lose three nodes, you don't lose a whole node's worth of
 data.  We configure each replica to be in a different availability
 zone so that we can lose a third of our nodes (a whole zone) and still
 work. On a 300 node system with RF=3 and no zones, losing one or two
 nodes you still have all your data, and can repair the loss quickly.
 With three nodes dead at once you don't lose 1% of the data (3/300) I
 think you lose 1/(300*300*300) of the data (someone check my math?).

 If you want to always get a result, then you use read one, if you
 want to get a highly available better quality result use local quorum.
 That is a per-query option.

 Adrian

 On Tue, Apr 19, 2011 at 6:46 PM, Terje Marthinussen
 tmarthinus...@gmail.com wrote:
  If you have RF=3 in both datacenters, it could be discussed if there is a
  point to use the built in replication in Cassandra at all vs. feeding the
  data to both datacenters and get 2 100% isolated cassandra instances that
  cannot replicate sstable corruptions 

Re: system_* consistency level?

2011-04-20 Thread William Oberman
That was the trick.  Thanks!

On Apr 20, 2011, at 6:05 PM, Jonathan Ellis jbel...@gmail.com wrote:

 See the comments for describe_schema_versions.

 On Wed, Apr 20, 2011 at 4:59 PM, William Oberman
 ober...@civicscience.com wrote:
 Hi,

 My unit tests started failing once I upgraded from a single node cassandra
 cluster to a full N node cluster (I'm starting with 4).  I had a few
 various bugs, mostly due to forgetting to read/write at a quorum level in
 places I needed stronger consistency guarantees.  But, I kept getting
 random, intermittent failure (the worst kind).  I'm 99% sure I see why,
 after some painful debugging, but I don't know what to do about it.  The
 basic flaw in my understanding of cassandra seems to boil down to: I thought
 system mutations of keyspaces/column families where of a stronger
 consistency than ONE, but that appears to not be true.  Any way for me to
 update a cluster at something more like QUORUM?

 The basic idea is in my unit test.setup() I clone my real keyspace as
 keyspace_UUID (with all of the exact same CFs) to get a fresh space to play
 in.  In a single node environment, no issues.  But, in a cluster, it seems
 that it takes a while for the system_add_keyspace call to propagate.  No
 worries I think, I just modify my setup() to do
 describe_keyspace(keyspace_UUID) in a while loop until the cluster is
 ready.  My random failures drop considerably, but every once and awhile I
 see a similar kind of failure.  Then I find out that schema updates seem to
 propagate on a per node basis.  At least, that's what I have to assume as
 I'm using phpcassa which uses a connection pool, and I see in my logging
 that my setup() succeeds because one connection in the pool sees the new
 keyspace, but when my tests run I grab a connection from the pool that is
 missing it!

 Do I have a solution other than changing my setup yet again to loop over all
 cassandra servers doing a describe_keyspace()?

 --
 Will Oberman
 Civic Science, Inc.
 3030 Penn Avenue., First Floor
 Pittsburgh, PA 15201
 (M) 412-480-7835
 (E) ober...@civicscience.com




 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com


Re: Multi-DC Deployment

2011-04-20 Thread Adrian Cockcroft
Queues replicate bad data just as well as anything else. The biggest
source of bad data is broken app code... You will still need to
implement a reconciliation/repair checker, as queues have their own
failure modes when they get backed up. We have also looked at using
queues to bounce data between cassandra clusters for other reasons,
and they have their place. However it is a lot more work to implement
than using existing well tested Cassandra functionality to do it for
us.

I think your code needs to retry a failed local-quorum read with a
read-one to get the behavior you are asking for.

Our approach to bad data and corruption issues is backups, wind back
to the last good snapshot. We have figured out incremental backups as
well as full. Our code has some local dependencies, but could be the
basis for a generic solution.

Adrian

On Wed, Apr 20, 2011 at 6:08 PM, Terje Marthinussen
tmarthinus...@gmail.com wrote:
 Assuming that you generally put an API on top of this, delivering to two or
 more systems then boils down to a message queue issue or some similar
 mechanism which handles secure delivery of messages. Maybe not trivial, but
 there are many products that can help you with this, and it is a lot easier
 to implement than a fully distributed storage system.
 Yes, ideally Cassandra will not distribute corruption, but the reason you
 pay up to have 2 fully redundant setups in 2 different datacenters is
 because we do not live in an ideal world. Anyone having tested Cassandra
 since 0.7.0 with any real data will be able to testify how well it can mess
 things up.
 This is not specific to Cassandra, in fact, I would argue thats this is in
 the blood of any distributed system. You want them to distribute after all
 and the tighter the coupling is between nodes, the better they distribute
 bad stuff as well as good stuff.
 There is a bigger risk for a complete failure with 2 tightly coupled
 redundant systems than with 2 almost completely isolated ones. The logic
 here is so simple it is really somewhat beyond discussion.
 There are a few other advantages of isolating the systems. Especially in
 terms of operation, 2 isolated systems would be much easier as you could
 relatively risk fee try out a new cassandra in one datacenter or upgrade one
 datacenter at a time if you needed major operational changes such as schema
 changes or other large changes to the data.
 I see the 2 copies in one datacenters + 1(or maybe 2) in another as a low
 cost middleway between 2 full N+2 (RF=3) systems in both data centers.
 That is, in a traditional design where you need 1 node for normal service,
 you would have 1 extra replicate for redundancy and one replica more (N+2
 redundancy) so you can do maintenance and still be redundant.
 If I have redundancy across datacenters, I would probably still want 2
 replicas to avoid network traffic between DCs in case of a node recovery,
 but N+2 may not be needed as my risk policy may find it acceptable to run
 one datacenters without redundancy for a time limited period for
 maintenance.
 That is, if my original requirement is 1 node, I could do with 3x the HW
 which is not all that much more than the 3x I need for one DC and a lot less
 than the 6x I need for 2 full N+2 systems.
 However, all of the above is really beyond the point of my original
 suggestion.
 Regardless of datacenters, redundancy and distribution of bad or good stuff,
 it would be good to have a way to return whatever data is there, but with a
 flag or similar stating that the consistency level was not met.
 Again, for a lot of services, it is fully acceptable, and a lot better, to
 return an almost complete (or maybe even complete, but no verified by
 quorum) result than no result at all.
 As far as I remember from the code, this just boils down to returning
 whatever you collected from the cluster and setting the proper flag or
 similar on the resultset rather than returning an error.
 Terje
 On Thu, Apr 21, 2011 at 5:01 AM, Adrian Cockcroft
 adrian.cockcr...@gmail.com wrote:

 Hi Terje,

 If you feed data to two rings, you will get inconsistency drift as an
 update to one succeeds and to the other fails from time to time. You
 would have to build your own read repair. This all starts to look like
 I don't trust Cassandra code to work, so I will write my own buggy
 one off versions of Cassandra functionality. I lean towards using
 Cassandra features rather than rolling my own because there is a large
 community testing, fixing and extending Cassandra, and making sure
 that the algorithms are robust. Distributed systems are very hard to
 get right, I trust lots of users and eyeballs on the code more than
 even the best engineer working alone.

 Cassandra doesn't replicate sstable corruptions. It detects corrupt
 data and only replicates good data. Also data isn't replicated to
 three identical nodes in the way you imply, it's replicated around the
 ring. If you lose three nodes, you don't lose a whole node's 

seed faq

2011-04-20 Thread Maki Watanabe
I made self answered faqs on seed after reading the wiki and code.
If I misunderstand something, please point out to me.

== What are seeds? ==

Seeds, or seed nodes are the nodes which new nodes refer to on
bootstrap to know ring information.
When you add a new node to ring, you need to specify at least one live
seed to contact. Once a node join the ring, it learns about the other
nodes, so it doesn't need seed on subsequent boot.

There is no special configuration for seed node itself. In stable and
static ring, you can point non-seed node as seed on bootstrap though
it is not recommended.
Nodes in the ring tend to send Gossip message to seeds more often by
design, so it is probable that seeds have most recent and updated
information of the ring. ( Refer to [[ArchitectureGossip]] for more
details )

== Does single seed mean single point of failure? ==

If you are using replicated CF on the ring, only one seed in the ring
doesn't mean single point of failure. The ring can operate or boot
without the seed. But it is recommended to have multiple seeds in
production system to maintain the ring.



Thanks
-- 
maki


Re: CQL in future 8.0 cassandra will work as I'm expecting ?

2011-04-20 Thread Jonathan Ellis
You want to run map/reduce jobs for your use case. You can already do
this with Cassandra (http://wiki.apache.org/cassandra/HadoopSupport),
and DataStax is introducing Brisk soon to make it easier:
http://www.datastax.com/products/brisk

On Wed, Apr 20, 2011 at 9:36 PM, Jonathan Ellis jbel...@gmail.com wrote:
 CQL changes the API, that is all.

 On Wed, Apr 20, 2011 at 5:40 PM, Constantin Teodorescu
 braila...@gmail.com wrote:
 My use case is as follows: we are using in 70% of the jobs information
 retrieval using keys, column names and ranges and up to now, what we have
 tested suits our need.
 However, the rest of 30% of the jobs involve full sequential scan of all
 records in the database.
 I found some web pages describing the next good thing for cassandra 0.8
 release, CQL, and I'm wondering: the CQL execution will involve separate
 processes running simultaneously on all nodes in the cluster that will do
 the filtering and pre-sorting phase on the local stored data (using
 indexes when available) and then execute the merge phase on a single node
 (that one that have received the request) ?
 Best regards,
 Teo




 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com