when 0.8.1 will be released?

2011-05-23 Thread Donal Zang

Hi,
Is there a time planed for 0.8.1 release?
I want to use the CompositeType comparer :

https://issues.apache.org/jira/browse/CASSANDRA-2231

Thanks!
Donal




get_indexed_slices count api

2011-05-31 Thread Donal Zang

Hi,
I'm query on cassandra like select count(*) from table where column1 = 
v1 and ..., based on a secondary index on column1.
But using get_indexed_slices(), I have to fetch all the rows and count 
on them, which is not needed.
So a get_indexed_slices count api [1] would be very helpful, but it 
seems no one is working on this now. (I see it's related to [2], which 
is blocked by [3])


My question is :  Will get_indexed_slices count api be provided? Or I 
should doing some CF-based indexes by myself like  
http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html?


Thanks!
Donal

https://issues.apache.org/jira/browse/CASSANDRA-2601 [1]
https://issues.apache.org/jira/browse/CASSANDRA-1600 [2]
https://issues.apache.org/jira/browse/CASSANDRA-1034 [3]




slow insertion rate with secondary index

2011-06-05 Thread Donal Zang

I did a insertion test with and without secondary indexes, and found that:
Without secondary index: ~10864 rows inserted per second
With secondary index on one column(BytesType): ~1515 rows inserted per 
second

Is this normal? why secondary index would have so much affect?

I noticed that If I build the index using “update column family ...” 
after I inserted all data (90578207 rows) , It will finish very quickly.
I'm not very clear about how the secondary index works, will some one 
explain this ?

Thanks!
Donal




Re: [SPAM] Re: slow insertion rate with secondary index

2011-06-06 Thread Donal Zang

On 06/06/2011 10:15, David Boxenhorn wrote:
Is there really a 10x difference between indexed CFs and non-indexed CFs? 

Well, as for my test, it is!
I'm using 0.7.6-2, 9 nodes, 3 replicas, write_consistency_level QUORUM, 
about 90,000,000 rows (~ 1K per row)

I use 20 process, 20rows for each insertion.
the insertion time for the whole row is about 0.02 seconds without index
and then I add a secondary index, and update every row with the indexed 
column, the insertion time is about 2 seconds

and if I remove the index, and update the column, the time is about 0.002

Another thing I noticed is : if you first do insertion, and then build 
the secondary index use update column family ..., and then do select 
based on the index, the result is not right (seems the index is still 
being built though the update commands returns quickly). And after a 
while, the get_indexed_slices() goes time out from time to time (with 
pycassa.ConnectionPool('keyspace1', ['host1','host2'], timeout=600, 
pool_size=1) ).


Does some one else have some same experiences using the secondary indexes?

--
Donal Zang
Computing Center, IHEP
19B YuquanLu, Shijingshan District,Beijing, 100049
zan...@ihep.ac.cn
86 010 8823 6018




Re: insert slowdown with secondary indexes

2011-06-11 Thread Donal Zang

On 11/06/2011 02:27, jodylandren...@comcast.net wrote:

I'm trying to understand why doing the inserts into a column family with 
indexes seems to jam things up and am wondering if there are any settings that 
I could tweak to help. It seems that the 4 node cluster should be able to 
handle 2 threads of data coming at it.  Has anyone had any experience with this 
number of indexes per column family? Any insight or suggestions would be 
appreciated.

Hi,
I used to post an email about this, see the mail list archive.
The secondary index now use hash method, and it causes an random I/O 
when do insertion(so lots of swap work). Also, the query based on it 
would be slow too.
So my advice would be : don't use the secondary index, at least for now 
(there are plans to build an bitmap index [1])
You can try Ed Uff 's method [2] to build an CF as your index, it's much 
faster than the secondary index. (this method may need the CompositeType 
[3])


[1] https://issues.apache.org/jira/browse/CASSANDRA-1472
[2] http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
[3] https://issues.apache.org/jira/browse/CASSANDRA-2231

--
Donal Zang
Computing Center, IHEP
19B YuquanLu, Shijingshan District,Beijing, 100049
zan...@ihep.ac.cn
86 010 8823 6018




Re: Querying superColumn

2011-06-16 Thread Donal Zang

Well, you are looking for the secondary index.
But for now,AFAIK, the supercolumn can not use secondary index .
On 16/06/2011 13:55, Vivek Mishra wrote:


Now for rowKey 'DEPT1' I have inserted multiple super column like:

*Employee1{*

*Name: Vivek*

*country:  India*

*}*

**

*Employee2{*

*Name: Vivs*

*country:  USA*

*}*

Now if I want to retrieve a super column whose rowkey is 'DEPT1' and  
employee name is 'Vivek'. Can I get only 'EMPLOYEE1' ?





--
Donal Zang
Computing Center, IHEP
19B YuquanLu, Shijingshan District,Beijing, 100049
zan...@ihep.ac.cn
86 010 8823 6018



Re: Counter Column

2011-06-27 Thread Donal Zang

On 27/06/2011 17:04, Artem Orobets wrote:


Hi!

As I know, we use counter column only with replication factor ALL, so 
is it mean that we can't read data while any replica will fail?


you can use any consistency level, using replicate_on_write=true when 
create the counter column family.


--
Donal Zang
Computing Center, IHEP
19B YuquanLu, Shijingshan District,Beijing, 100049
zan...@ihep.ac.cn
86 010 8823 6018



Re: Counter Column

2011-06-28 Thread Donal Zang

On 27/06/2011 19:19, Sylvain Lebresne wrote:

Let me make that simpler.

Don't ever use replicate_on_write=false (even if you think that it is
what you want, there is a good chance it's not).
Obviously, the default is replicate_on_write=true.
I may be wrong. But with 0.8.0, I think the default is 
replicate_on_write=false, you have to declare it explicitly.


--
Donal Zang
Computing Center, IHEP
19B YuquanLu, Shijingshan District,Beijing, 100049
zan...@ihep.ac.cn
86 010 8823 6018




Re: CompositeType for row Keys

2011-07-22 Thread Donal Zang
If you are using OPP, then you can use CompositeType on both key and 
column name; otherwise(Random Partition), just use it for columns.

On 22/07/2011 17:10, Patrick Julien wrote:

With the current implementation of CompositeType in Cassandra 0.8.1,
is it recommended practice to try to use a CompositeType as the key?
Or are both, column and key, equally well supported?

The documentation on CompositeType is light, well non-existent really, with

key_validation_class set to CompositeType (UUIDType, IntegerType)

can we query all matching rows just by using CompositeType(UUIDType)?

In my specific use case, what would work best is to have a composite
key that is a CompositeType with thousands of columns each.




--
Donal Zang
Computing Center, IHEP
19B YuquanLu, Shijingshan District,Beijing, 100049
zan...@ihep.ac.cn
86 010 8823 6018




Re: CompositeType for row Keys

2011-07-22 Thread Donal Zang

On 22/07/2011 17:56, Patrick Julien wrote:

I can still use it for keys if I don't need ranges then?  Because for
what we are doing we can always re-assemble keys

yes,but why would you use CompositeType if you don't need range query?

On Fri, Jul 22, 2011 at 11:38 AM, Donal Zangzan...@ihep.ac.cn  wrote:

If you are using OPP, then you can use CompositeType on both key and column
name; otherwise(Random Partition), just use it for columns.
On 22/07/2011 17:10, Patrick Julien wrote:

With the current implementation of CompositeType in Cassandra 0.8.1,
is it recommended practice to try to use a CompositeType as the key?
Or are both, column and key, equally well supported?

The documentation on CompositeType is light, well non-existent really,
with

key_validation_class set to CompositeType (UUIDType, IntegerType)

can we query all matching rows just by using CompositeType(UUIDType)?

In my specific use case, what would work best is to have a composite
key that is a CompositeType with thousands of columns each.



--
Donal Zang
Computing Center, IHEP
19B YuquanLu, Shijingshan District,Beijing, 100049
zan...@ihep.ac.cn
86 010 8823 6018






--
Donal Zang
Computing Center, IHEP
19B YuquanLu, Shijingshan District,Beijing, 100049
zan...@ihep.ac.cn
86 010 8823 6018




Re: [SPAM] Fwd: Counter consistency - are counters idempotent?

2011-07-22 Thread Donal Zang

On 22/07/2011 18:08, Yang wrote:

btw, this issue of  not knowing whether a write is persisted or not
when client reports error, is not limited to counters,  for regular
columns, it's the same: if client reports write failure, the value may
well be replicated to all replicas later.  this is even the same with
all other systems: Zookeeper, Paxos, ultimately due to the FLP
theoretical result of no guarantee of consensus in async systems

yes, but with regular columns, retry is OK, while counter is not.


-- Forwarded message --
From: Sylvain Lebresnesylv...@datastax.com
Date: Fri, Jul 22, 2011 at 8:03 AM
Subject: Re: Counter consistency - are counters idempotent?
To: user@cassandra.apache.org


On Fri, Jul 22, 2011 at 4:52 PM, Kenny Yukenny...@knewton.com  wrote:

As of Cassandra 0.8.1, are counter increments and decrements idempotent? If,
for example, a client sends an increment request and the increment occurs,
but the network subsequently fails and reports a failure to the client, will
Cassandra retry the increment (thus leading to an overcount and inconsistent
data)?
I have done some reading and I am getting conflicting sources about counter
consistency. In this source
(http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/clarification-of-the-consistency-guarantees-of-Counters-td6421010.html),
it states that counters now have the same consistency as regular
columns--does this imply that the above example will not lead to an
overcount?

That email thread was arguably a bit imprecise with its use of the
word 'consistency'
but what it was talking about is really consistency level. That is, counter
supports all the usual consistency levels (ONE, QUORUM, ALL, LOCAL_QUORUM,
EACH_QUORUM) excepted ANY.
Counter are still not idempotent. And just a small precision, if you
get a TimeoutException,
Cassandra never retry the increment on it's own (your sentence
suggests it does),
but you won't know in that case if the increment was persisted or not,
and thus you
won't know if you should retry or not. And yes, this is still a
limitation of counters.



If counters are not idempotent, are there examples of effective uses of
counters that will prevent inconsistent counts?
Thank you for your help.



--
Donal Zang
Computing Center, IHEP
19B YuquanLu, Shijingshan District,Beijing, 100049
zan...@ihep.ac.cn
86 010 8823 6018




how to see how many rows in each node?

2010-12-03 Thread Donal Zang

RT.
Is there any command or api?

Thanks!




Can super column family use column_metadata?

2010-12-07 Thread Donal Zang

Hi,
I'm using 0.7.0-rc1,and when I use cassandra-cli to create a column 
family with metadata, I got null,and no column family is created.

The command I use is:

/create keyspace test;
use test;
create column family test1 with column_type = 'Super' and comparator = 
'LongType' and column_metadata 
=[{column_name:a,validation_class:LongType}];/


And also the examples gived by help create column family; won't work!

Any ideas?
Thanks!


Re: Between Clause

2011-01-17 Thread Donal Zang

On 17/01/2011 11:55, kh jo wrote:
What is the best way to model a query with between clause.. given that 
you have a large number of entries...


thanks
Jo




In my experience,for the row based 'between clause' with a random 
partition, you should design the column family carefully, So that you 
can get all the rows' key.
In this case you can use a multi_get() instead of get_range(), and you 
can do get_range() between columns in a row.




--




should nodetool repair run periodic to keep consistency?

2011-01-19 Thread Donal Zang

Just to ensure.
So this should be done manually by the cluster operators?

Thanks!

--





Re: Cassandra automatic startup script on ubuntu

2011-01-20 Thread Donal Zang

On 20/01/2011 17:51, Sébastien Druon wrote:

Hello!

I am using cassandra on a ubuntu machine and installed it from the 
binary found on the cassandra home page.

However, I did not find any scripts to start it up at boot time.

Where can I find this kind of script?

Thanks a lot in advance

Sebastien

Hi, this is what I do, you can add the watchdog to rc.local
/%S[%m]%s %~ %# cat watchdog
#!/bin/bash
#
# This script is to check every $INTERVAL seconds to see
# whether cassandra is work well
# and restart it if neccesary
# by donal 2010-01-11
#
PORT=9160
INTERVAL=2
CASSANDRA=/opt/cassandra
check() {
netstat -tln|grep LISTEN|grep :$1
if [ $? != 0 ]; then
echo restarting cassandra
$CASSANDRA/bin/stop-server
sleep 1
$CASSANDRA/bin/start-server
fi
}
while true
  do check $PORT
  sleep $INTERVAL
done/



apache.org down?

2011-03-03 Thread Donal Zang

rt

--
Donal Zang





NullPointerException with 0.7.4

2011-04-03 Thread Donal Zang

Hi,

I'm doing a stress test, and cassandra crashed with this Exception:
ERROR [MutationStage:9] 2011-04-03 21:11:50,152
DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
java.lang.NullPointerException
at
org.apache.cassandra.io.sstable.IndexSummary$KeyPosition.compareTo(IndexSummary.java:100)
at
org.apache.cassandra.io.sstable.IndexSummary$KeyPosition.compareTo(IndexSummary.java:87)
at java.util.Collections.indexedBinarySearch(Collections.java:232)
at java.util.Collections.binarySearch(Collections.java:218)
at
org.apache.cassandra.io.sstable.SSTableReader.getIndexScanPosition(SSTableReader.java:333)
at
org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:459)
at
org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:563)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:61)
at
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:58)
at
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1353)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1245)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1173)
at
org.apache.cassandra.db.Table.readCurrentIndexedColumns(Table.java:459)
at org.apache.cassandra.db.Table.apply(Table.java:394)
at
org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:76)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)

--

Donal Zang
CERN PH-ADP-DDM 40-3-D16
CH-1211 Geneve 23
donal.z...@cern.ch
+41 22 76 71268





statistcs query on cassandra

2011-04-04 Thread Donal Zang

Can we do count like this?
/count cf[startKey:endKey] where column = value/

--
Donal Zang
Computing Center, IHEP
19B YuquanLu, Shijingshan District,Beijing, 100049
zan...@ihep.ac.cn
86 010 8823 6018