unsibscribe

2012-05-30 Thread Maxim Potekhin




row cache -- does it have data from other nodes?

2012-05-17 Thread Maxim Potekhin

Hello,

when I chose to have a rowcache -- will it contain data that is owned by 
other nodes?


Thanks

Maxim



Re: Server Side Logic/Script - Triggers / StoreProc

2012-04-29 Thread Maxim Potekhin
About a year ago I started getting a strange feeling that
the noSQL community is busy re-creating RDBMS in minute detail.

Why did we bother in the first place?

Maxim



On 4/27/2012 6:49 PM, Data Craftsman wrote:
 Howdy,

 Some Polyglot Persistence(NoSQL) products started support server side
 scripting, similar to RDBMS store procedure.
 E.g. Redis Lua scripting.

 I wish it is Python when Cassandra has the server side scripting feature.

 FYI,

 http://antirez.com/post/250

 http://nosql.mypopescu.com/post/19949274021/alchemydb-an-integrated-graphdb-rdbms-kv-store

 server side scripting support is an extremely powerful tool. Having
 processing close to data (i.e. data locality) is a well known
 advantage, ..., it can open the doors to completely new features.

 Thanks,

 Charlie (@mujiang) 一个 木匠
 ===
 Data Architect Developer
 http://mujiang.blogspot.com

 On Sun, Apr 22, 2012 at 9:35 AM, Brian O'Neill boneil...@gmail.com wrote:
 Praveen,

 We are certainly interested. To get things moving we implemented an add-on
 for Cassandra to demonstrate the viability (using AOP):
 https://github.com/hmsonline/cassandra-triggers

 Right now the implementation executes triggers asynchronously, allowing you
 to implement a java interface and plugin your own java class that will get
 called for every insert.

 Per the discussion on 1311, we intend to extend our proof of concept to be
 able to invoke scripts as well.  (minimally we'll enable javascript, but
 we'll probably allow for ruby and groovy as well)

 -brian

 On Apr 22, 2012, at 12:23 PM, Praveen Baratam wrote:

 I found that Triggers are coming in Cassandra 1.2
 (https://issues.apache.org/jira/browse/CASSANDRA-1311) but no mention of any
 StoreProc like pattern.

 I know this has been discussed so many times but never met with
 any initiative. Even Groovy was staged out of the trunk.

 Cassandra is great for logging and as such will be infinitely more useful if
 some logic can be pushed into the Cassandra cluster nearer to the location
 of Data to generate a materialized view useful for applications.

 Server Side Scripts/Routines in Distributed Databases could soon prove to be
 the differentiating factor.

 Let me reiterate things with a use case.

 In our application we store time series data in wide rows with TTL set on
 each point to prevent data from growing beyond acceptable limits. Still the
 data size can be a limiting factor to move all of it from the cluster node
 to the querying node and then to the application via thrift for processing
 and presentation.

 Ideally we should process the data on the residing node and pass only the
 materialized view of the data upstream. This should be trivial if Cassandra
 implements some sort of server side scripting and CQL semantics to call it.

 Is anybody else interested in a similar feature? Is it being worked on? Are
 there any alternative strategies to this problem?

 Praveen



 --
 Brian ONeill
 Lead Architect, Health Market Science (http://healthmarketscience.com)
 mobile:215.588.6024
 blog: http://weblogs.java.net/blog/boneill42/
 blog: http://brianoneill.blogspot.com/






Re: Cassandra search performance

2012-04-29 Thread Maxim Potekhin
Jason,

I'm using plenty of secondary indexes with no problem at all.

Looking at your example,as I think you understand, you forgo indexes by
combining two conditions in one query, thinking along the lines of what is
often done in RDBMS. A scan is expected in this case, and there is no
magic to avoid it.

However, if this query is important, you can easily index on two conditions,
using a composite type (look it up), or string concatenation for quick and
easy solution. Which is, you _create an additional column_ which contains a
combination of the two you want to use in a query. Then index on it.
Problem solved.
The composite solution is more elegant but what I describe works in
simple cases.
It works for me.

Maxim


On 4/25/2012 10:45 AM, Jason Tang wrote:
 1.0.8

 在 2012年4月25日 下午10:38,Philip Shon philip.s...@gmail.com
 mailto:philip.s...@gmail.com写 道:

 what version of cassandra are you using. I found a big performance
 hit when querying on the secondary index.

 I came across this bug in versions prior to 1.1

 https://issues.apache.org/jira/browse/CASSANDRA-3545

 Hope that helps.

 2012/4/25 Jason Tang ares.t...@gmail.com
 mailto:ares.t...@gmail.com

 And I found, if I only have the search condition status, it
 only scan 200 records.

 But if I combine another condition partition then it scan
 all records because partition condition match all records.

 But combine with other condition such as userName, even all
 userName is same in the 1,000,000 records, it only scan 200
 records.

 So it impacted by scan execution plan, if we have several
 search conditions, how it works? Do we have the similar
 execution plan in Cassandra?


 在 2012年4月25日 下午9:18,Jason Tang ares.t...@gmail.com
 mailto:ares.t...@gmail.com写 道:

 Hi

 We have the such CF, and use secondary index to search for
 simple data status, and among 1,000,000 row records, we
 have 200 records with status we want.

 But when we start to search, the performance is very poor,
 and check with the command ./bin/nodetool -h localhost -p
 8199 cfstats , Cassandra read 1,000,000 records, and
 Read Latency is 0.2 ms, so totally it used 200 seconds.

 It use lots of CPU, and check the stack, all thread in
 Cassandra is read from socket.

 So I wonder, how to really use index to find the 200
 records instead of scan all rows. (Supper Column?)

 /ColumnFamily: queue/
 /Key Validation Class:
 org.apache.cassandra.db.marshal.BytesType/
 /Default column value validator:
 org.apache.cassandra.db.marshal.BytesType/
 /Columns sorted by: org.apache.cassandra.db.marshal.BytesType/
 /Row cache size / save period in seconds / keys to save :
 0.0/0/all/
 /Row Cache Provider:
 org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider/
 /Key cache size / save period in seconds: 0.0/0/
 /GC grace seconds: 0/
 /Compaction min/max thresholds: 4/32/
 /Read repair chance: 0.0/
 /Replicate on write: false/
 /Bloom Filter FP chance: default/
 /Built indexes: [queue.idxStatus]/
 /Column Metadata:/
 /Column Name: status (737461747573)/
 /Validation Class: org.apache.cassandra.db.marshal.AsciiType/
 /Index Name: idxStatus/
 /Index Type: KEYS/
 /
 /
 BRs
 //Jason







Re: RMI/JMX errors, weird

2012-04-24 Thread Maxim Potekhin

Hello Aaron,

it's probably the over-optimistic number of concurrent compactors that 
was tripping the system.


I do not entirely understand what's the correlation here, maybe it's 
that the compactors were overloading
the neighboring nodes causing time-outs. I tuned the concurrency down 
and after a while things seem

to have settled down, thanks for the suggestion.

Maxim


On 4/19/2012 4:13 PM, aaron morton wrote:

1150 pending tasks, and is not
making progress.
Not all pending tasks reported by nodetool compactionstats actually 
run. Once they get a chance to run the files they were going to work 
on may have already been compacted.


Given that repair tests at double the phi threshold, it may not make 
much difference.


Did other nodes notice it was dead ? Was there anything in the log 
that showed it was under duress (GC or dropped message logs) ?


Is the compaction a consequence of repair ? (The streaming stage can 
result in compactions). Or do you think the node is just behind on 
compactions ?


If you feel compaction is hurting the node, consider 
setting concurrent_compactors in the yaml to 2.


You can also isolate the node from updates using nodetool 
disablegossip and disablerthrift , and the turn off the IO limiter 
with nodetool setcompactionthroughput 0.

Hope that helps.
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 20/04/2012, at 12:29 AM, Maxim Potekhin wrote:


Hello Aaron,

how should I go about fixing that? Also, after a repeated attempt to 
compact
it goes again into building secondary index with 1150 pending 
tasks, and is not
making progress. I suspected the disk system failure, but this needs 
to be confirmed.


So basically, do I need to tune the phi threshold up? Thing is, there 
was no heavy load

on the cluster at all.

Thanks

Maxim




On 4/19/2012 7:06 AM, aaron morton wrote:
At some point the gossip system on the node this log is from decided 
that 130.199.185.195 was DOWN. This was based on how often the node 
was gossiping to the cluster.


The active repair session was informed. And to avoid failing the job 
unnecessarily it tested that the errant nodes phi value was twice 
the configured phi_convict_threshold. It was and the repair was killed.


Take a look at the logs on 130.199.185.195 and see if anything was 
happening on the node at the same time. Could  be GC or an 
overloaded node (it would log about dropped messages).


Perhaps other nodes also saw 130.199.185.195 as down? it only needed 
to be down for a few seconds.


Hope that helps.

-
Aaron Morton





Re: RMI/JMX errors, weird

2012-04-18 Thread Maxim Potekhin
)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)
at 
sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303)

at sun.rmi.transport.Transport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.RuntimeException: java.io.IOException: Problem 
during repair session manual-repair-1b3453b6-28b5-4abd-84ce-0326

b5468064, endpoint /130.199.185.193 died
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

at java.util.concurrent.FutureTask.run(FutureTask.java:138)
... 3 more
Caused by: java.io.IOException: Problem during repair session 
manual-repair-1b3453b6-28b5-4abd-84ce-0326b5468064, endpoint /130.199.

185.193 died
at 
org.apache.cassandra.service.AntiEntropyService$RepairSession.failedNode(AntiEntropyService.java:723)
at 
org.apache.cassandra.service.AntiEntropyService$RepairSession.convict(AntiEntropyService.java:760)
at 
org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:165)
at 
org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:538)

at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:57)
at 
org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:157)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)




On 4/12/2012 10:03 PM, aaron morton wrote:

Look at the server side logs for errors.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 13/04/2012, at 11:47 AM, Maxim Potekhin wrote:


Hello,

I'm doing compactions under 0.8.8.

Recently, I started seeing a stack trace like one below, and I can't 
figure out what causes this to appear.
The cluster has been in operation for mode than half a year w/o 
errors like this one.


Any help will be appreciated,
Thanks

Maxim


WARNING: Failed to check the connection: 
java.net.SocketTimeoutException: Read timed out
Exception in thread main java.io.IOException: Repair command #1: 
some repair session(s) failed (see log for details).
   at 
org.apache.cassandra.service.StorageService.forceTableRepair(StorageService.java:1613)

   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
   at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
   at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
   at 
com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
   at 
com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
   at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
   at 
com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
   at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
   at 
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
   at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
   at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
   at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)

   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303

Re: Is the secondary index re-built under compaction?

2012-04-17 Thread Maxim Potekhin

Thanks Aaaron. Just to be clear, every time I do a compaction,
I rebuild all indexes from scratch. Right?

Maxim


On 4/17/2012 6:16 AM, aaron morton wrote:

Yes secondary index builds are done via the compaction manager.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/04/2012, at 1:06 PM, Maxim Potekhin wrote:

I noticed that nodetool compactionstats shows the building of the 
secondary index while

I initiate compaction. Is this to be expected? Cassandra version 0.8.8.

Thank you

Maxim







Re: Is the secondary index re-built under compaction?

2012-04-17 Thread Maxim Potekhin

Thanks Jake. Then I am definitely seeing weirdness, as there are tons of
pending tasks in compaction stats, and tons of index files created in the
data directory. Plus it does tell me that it is building the secondary 
index,

and that seems to be happening at an amazingly glacial pace.

I have 2 CFs there, with multiple secondary indexes. I'll try
to compact the CF one by one, reboot and see if that helps.

Maxim


On 4/17/2012 9:53 AM, Jake Luciani wrote:
No, the indexes are not rebuilt every compaction.  Only if you 
manually rebuild or bootstrap a new node does it use compaction 
manager to rebuild.


On Tue, Apr 17, 2012 at 9:47 AM, Maxim Potekhin potek...@bnl.gov 
mailto:potek...@bnl.gov wrote:


Thanks Aaaron. Just to be clear, every time I do a compaction,
I rebuild all indexes from scratch. Right?

Maxim



On 4/17/2012 6:16 AM, aaron morton wrote:

Yes secondary index builds are done via the compaction manager.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/04/2012, at 1:06 PM, Maxim Potekhin wrote:


I noticed that nodetool compactionstats shows the building of
the secondary index while
I initiate compaction. Is this to be expected? Cassandra version
0.8.8.

Thank you

Maxim








--
http://twitter.com/tjake




Re: Is the secondary index re-built under compaction?

2012-04-17 Thread Maxim Potekhin
I understand that indexes are CFs. But the compaction stats says it's 
building the
index, not compacting the corresponding CF. Either that's an ambiguous 
diagnostic,

or indeed something is not right with my rig as of late.

Maxim



On 4/17/2012 10:05 AM, Jake Luciani wrote:
Well, the since the secondary indexes are themselves 
column families they too are compacted along with everything else.


On Tue, Apr 17, 2012 at 10:02 AM, Maxim Potekhin potek...@bnl.gov 
mailto:potek...@bnl.gov wrote:


Thanks Jake. Then I am definitely seeing weirdness, as there are
tons of
pending tasks in compaction stats, and tons of index files
created in the
data directory. Plus it does tell me that it is building the
secondary index,
and that seems to be happening at an amazingly glacial pace.

I have 2 CFs there, with multiple secondary indexes. I'll try
to compact the CF one by one, reboot and see if that helps.

Maxim



On 4/17/2012 9:53 AM, Jake Luciani wrote:

No, the indexes are not rebuilt every compaction.  Only if you
manually rebuild or bootstrap a new node does it use compaction
manager to rebuild.

On Tue, Apr 17, 2012 at 9:47 AM, Maxim Potekhin potek...@bnl.gov
mailto:potek...@bnl.gov wrote:

Thanks Aaaron. Just to be clear, every time I do a compaction,
I rebuild all indexes from scratch. Right?

Maxim



On 4/17/2012 6:16 AM, aaron morton wrote:

Yes secondary index builds are done via the compaction manager.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/04/2012, at 1:06 PM, Maxim Potekhin wrote:


I noticed that nodetool compactionstats shows the
building of the secondary index while
I initiate compaction. Is this to be expected? Cassandra
version 0.8.8.

Thank you

Maxim








-- 
http://twitter.com/tjake





--
http://twitter.com/tjake




Re: Is the secondary index re-built under compaction?

2012-04-17 Thread Maxim Potekhin
Yes. Sorry I didn't mention this, but of course I'm checking on indexes 
once in a while.

So yes, they are marked as built.

All of this started happening after a few days of continuous loading 
process. Since
the nodes have good hardware (24 cores + SSD), the apparent load on each 
node
was nothing remarkable, even at 20kHz insertion rate. But maybe I'm 
being overoptimistic.


Maxim


On 4/17/2012 10:12 AM, Jake Luciani wrote:

Hmm that does sound fishy.

When you run show keyspaces from cassandra-cli it shows which indexes 
are built.  Are they marked built in your column family?


-Jake

On Tue, Apr 17, 2012 at 10:09 AM, Maxim Potekhin potek...@bnl.gov 
mailto:potek...@bnl.gov wrote:


I understand that indexes are CFs. But the compaction stats says
it's building the
index, not compacting the corresponding CF. Either that's an
ambiguous diagnostic,
or indeed something is not right with my rig as of late.

Maxim




On 4/17/2012 10:05 AM, Jake Luciani wrote:

Well, the since the secondary indexes are themselves
column families they too are compacted along with everything else.

On Tue, Apr 17, 2012 at 10:02 AM, Maxim Potekhin
potek...@bnl.gov mailto:potek...@bnl.gov wrote:

Thanks Jake. Then I am definitely seeing weirdness, as there
are tons of
pending tasks in compaction stats, and tons of index files
created in the
data directory. Plus it does tell me that it is building the
secondary index,
and that seems to be happening at an amazingly glacial pace.

I have 2 CFs there, with multiple secondary indexes. I'll try
to compact the CF one by one, reboot and see if that helps.

Maxim



On 4/17/2012 9:53 AM, Jake Luciani wrote:

No, the indexes are not rebuilt every compaction.  Only if
you manually rebuild or bootstrap a new node does it use
compaction manager to rebuild.

On Tue, Apr 17, 2012 at 9:47 AM, Maxim Potekhin
potek...@bnl.gov mailto:potek...@bnl.gov wrote:

Thanks Aaaron. Just to be clear, every time I do a
compaction,
I rebuild all indexes from scratch. Right?

Maxim



On 4/17/2012 6:16 AM, aaron morton wrote:

Yes secondary index builds are done via the compaction
manager.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/04/2012, at 1:06 PM, Maxim Potekhin wrote:


I noticed that nodetool compactionstats shows the
building of the secondary index while
I initiate compaction. Is this to be expected?
Cassandra version 0.8.8.

Thank you

Maxim








-- 
http://twitter.com/tjake





-- 
http://twitter.com/tjake





--
http://twitter.com/tjake




Re: Is the secondary index re-built under compaction?

2012-04-17 Thread Maxim Potekhin

The offending CF only has one. The other one, that seems to behave well,
has nine.

Maxim


On 4/17/2012 10:20 AM, Jake Luciani wrote:

How many indexes are there?

On Tue, Apr 17, 2012 at 10:16 AM, Maxim Potekhin potek...@bnl.gov 
mailto:potek...@bnl.gov wrote:


Yes. Sorry I didn't mention this, but of course I'm checking on
indexes once in a while.
So yes, they are marked as built.

All of this started happening after a few days of continuous
loading process. Since
the nodes have good hardware (24 cores + SSD), the apparent load
on each node
was nothing remarkable, even at 20kHz insertion rate. But maybe
I'm being overoptimistic.

Maxim



On 4/17/2012 10:12 AM, Jake Luciani wrote:

Hmm that does sound fishy.

When you run show keyspaces from cassandra-cli it shows which
indexes are built.  Are they marked built in your column family?

-Jake

On Tue, Apr 17, 2012 at 10:09 AM, Maxim Potekhin
potek...@bnl.gov mailto:potek...@bnl.gov wrote:

I understand that indexes are CFs. But the compaction stats
says it's building the
index, not compacting the corresponding CF. Either that's an
ambiguous diagnostic,
or indeed something is not right with my rig as of late.

Maxim




On 4/17/2012 10:05 AM, Jake Luciani wrote:

Well, the since the secondary indexes are themselves
column families they too are compacted along with everything
else.

On Tue, Apr 17, 2012 at 10:02 AM, Maxim Potekhin
potek...@bnl.gov mailto:potek...@bnl.gov wrote:

Thanks Jake. Then I am definitely seeing weirdness, as
there are tons of
pending tasks in compaction stats, and tons of index
files created in the
data directory. Plus it does tell me that it is building
the secondary index,
and that seems to be happening at an amazingly glacial pace.

I have 2 CFs there, with multiple secondary indexes.
I'll try
to compact the CF one by one, reboot and see if that helps.

Maxim



On 4/17/2012 9:53 AM, Jake Luciani wrote:

No, the indexes are not rebuilt every compaction.  Only
if you manually rebuild or bootstrap a new node does it
use compaction manager to rebuild.

On Tue, Apr 17, 2012 at 9:47 AM, Maxim Potekhin
potek...@bnl.gov mailto:potek...@bnl.gov wrote:

Thanks Aaaron. Just to be clear, every time I do a
compaction,
I rebuild all indexes from scratch. Right?

Maxim



On 4/17/2012 6:16 AM, aaron morton wrote:

Yes secondary index builds are done via the
compaction manager.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/04/2012, at 1:06 PM, Maxim Potekhin wrote:


I noticed that nodetool compactionstats shows
the building of the secondary index while
I initiate compaction. Is this to be expected?
Cassandra version 0.8.8.

Thank you

Maxim








-- 
http://twitter.com/tjake





-- 
http://twitter.com/tjake





-- 
http://twitter.com/tjake





--
http://twitter.com/tjake




Is the secondary index re-built under compaction?

2012-04-16 Thread Maxim Potekhin
I noticed that nodetool compactionstats shows the building of the 
secondary index while

I initiate compaction. Is this to be expected? Cassandra version 0.8.8.

Thank you

Maxim



RMI/JMX errors, weird

2012-04-12 Thread Maxim Potekhin

Hello,

I'm doing compactions under 0.8.8.

Recently, I started seeing a stack trace like one below, and I can't 
figure out what causes this to appear.
The cluster has been in operation for mode than half a year w/o errors 
like this one.


Any help will be appreciated,
Thanks

Maxim


WARNING: Failed to check the connection: 
java.net.SocketTimeoutException: Read timed out
Exception in thread main java.io.IOException: Repair command #1: some 
repair session(s) failed (see log for details).
at 
org.apache.cassandra.service.StorageService.forceTableRepair(StorageService.java:1613)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
at 
com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
at 
com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
at 
com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)
at 
sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303)

at sun.rmi.transport.Transport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)



a very simple indexing question (strange thing seen in CLI)

2012-04-07 Thread Maxim Potekhin

Greetings,
Cassandra 0.8.8 is used.

I'm trying to create an additional CF which is trivial in all respects. 
Just ascii columns and a few indexes.


This is how I add an index:
update column family files with column_metadata = [{column_name : '1',  
validation_class : AsciiType, index_type : 0, index_name : 'pandaid'}];


When I do show keyspaces, I see this:

ColumnFamily: files
  Key Validation Class: org.apache.cassandra.db.marshal.BytesType
  Default column value validator: 
org.apache.cassandra.db.marshal.BytesType

  Columns sorted by: org.apache.cassandra.db.marshal.BytesType
  Row cache size / save period in seconds: 0.0/0
  Row Cache Provider: 
org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider

  Key cache size / save period in seconds: 20.0/14400
  Memtable thresholds: 2.2828125/1440/487 (millions of ops/minutes/MB)
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Replicate on write: true
  Built indexes: [files.pandaid]
  Column Metadata:
Column Name:  (01)
  Validation Class: org.apache.cassandra.db.marshal.AsciiType
  Index Name: pandaid
  Index Type: KEYS

First off, why do I see (01)? I have a similar CF where I just see 1. 
Before inserting the data, I did assume to ascii
on the keys, comparator and validator. The index has been built. When I 
try to access the data via the index, I get this:

[default@PANDA] get files where '1'='1460103677';
InvalidRequestException(why:No indexed columns present in index clause 
with operator EQ)



What is happening? Sorry for the admittedly trivial question, obviously 
I'm stuck with something quite simple

which I managed to do with zero effort in the past.

Maxim





Re: import

2012-04-01 Thread Maxim Potekhin

Since Python has a native csv module, it's trivial to achieve.
I load lots of csv data into my database daily.

Maxim

On 3/27/2012 11:44 AM, R. Verlangen wrote:
You can write your own script to parse the excel file (export as csv) 
and import it with batch inserts.


Should be pretty easy if you have experience with those techniques.

2012/3/27 puneet loya puneetl...@gmail.com mailto:puneetl...@gmail.com

I want to import files from excel to cassandra? Is it possible??

Any tool that can help??

Whats the best way??

Plz reply :)




--
With kind regards,

Robin Verlangen
www.robinverlangen.nl http://www.robinverlangen.nl





Building a brand new cluster and readying it for production -- advice needed

2012-03-13 Thread Maxim Potekhin

Dear All,

after all the testing and continuous operation of my first cluster,
I've been given an OK to build a second production Cassandra cluster in 
Europe.


There were posts in recent weeks regarding the most stable and solid 
Cassandra version.

I was wondering is anything better has appeared since it was last discussed.

At this juncture, I don't need features, just rock solid stability. Are 
0.8.* versions still acceptable,

since I have experience with these, or should I take the plunge to 1+?

I realize that I won't need more than 8GB RAM because I can't make Java 
heap too big. Is worth it
still to pay money for extra RAM? Is the cache located outside of heap 
in recent versions?


Thanks to all of you for the advice I'm receiving on this board.

Best regards

Maxim



Re: Implications of length of column names

2012-02-28 Thread Maxim Potekhin
When I migrated data from our RDBMS, I hashed columns names to integers. 
This makes for some
footwork, but the space gain is clearly there so it's worth it. I 
de-hash on read.


Maxim


On 2/10/2012 5:15 PM, Narendra Sharma wrote:
It is good to have short column names. They save space all the way 
from network transfer to in-memory usage to storage. It is also good 
idea to club immutables columns that are read together and store as 
single column. We gained significant overall performance benefits with 
this.


-Naren

On Fri, Feb 10, 2012 at 12:20 PM, Drew Kutcharian d...@venarc.com 
mailto:d...@venarc.com wrote:


What are the implications of using short vs long column names? Is
it better to use short column names or longer ones?

I know for MongoDB you are better of using short field names
http://www.mongodb.org/display/DOCS/Optimizing+Storage+of+Small+Objects 
 Does this apply to Cassandra column names?



-- Drew




--
Narendra Sharma
Software Engineer
/http://www.aeris.com http://www.persistentsys.com/
/http://narendrasharma.blogspot.com//






Please advise -- 750MB object possible?

2012-02-22 Thread Maxim Potekhin

Hello everybody,

I'm being asked whether we can serve an object, which I assume is a 
blob, of 750MB size?
I guess the real question is of how to chunk it and/or even it's 
possible to chunk it.


Thanks!

Maxim



Re: Please advise -- 750MB object possible?

2012-02-22 Thread Maxim Potekhin

The idea was to provide redundancy, resilience, automatic load balancing
and automatic repairs. Going the way of the file system does not achieve 
any of that.


Maxim


On 2/22/2012 1:34 PM, Mohit Anchlia wrote:

Outside on the file system and a pointer to it in C*

On Wed, Feb 22, 2012 at 10:03 AM, Rafael Almeida almeida...@yahoo.com 
mailto:almeida...@yahoo.com wrote:


Keep them where?


*From:* Mohit Anchlia mohitanch...@gmail.com
mailto:mohitanch...@gmail.com
*To:* user@cassandra.apache.org
mailto:user@cassandra.apache.org
*Cc:* potek...@bnl.gov mailto:potek...@bnl.gov
*Sent:* Wednesday, February 22, 2012 3:44 PM
*Subject:* Re: Please advise -- 750MB object possible?

In my opinion if you are busy site or application keep blobs
out of the database.

On Wed, Feb 22, 2012 at 9:37 AM, Dan Retzlaff
dretzl...@gmail.com mailto:dretzl...@gmail.com wrote:

Chunking is a good idea, but you'll have to do it
yourself. A few of the columns in our application got
quite large (maybe ~150MB) and the failure mode was RPC
timeout exceptions. Nodes couldn't always move that much
data across our data center interconnect in the default 10
seconds. With enough heap and a faster network you could
probably get by without chunking, but it's not ideal.


On Wed, Feb 22, 2012 at 9:04 AM, Maxim Potekhin
potek...@bnl.gov mailto:potek...@bnl.gov wrote:

Hello everybody,

I'm being asked whether we can serve an object,
which I assume is a blob, of 750MB size?
I guess the real question is of how to chunk it and/or
even it's possible to chunk it.

Thanks!

Maxim










Re: Please advise -- 750MB object possible?

2012-02-22 Thread Maxim Potekhin

Thank you so much, looks nice, I'll be looking into it.


On 2/22/2012 3:08 PM, Rob Coli wrote:



On Wed, Feb 22, 2012 at 10:37 AM, Maxim Potekhin potek...@bnl.gov 
mailto:potek...@bnl.gov wrote:


The idea was to provide redundancy, resilience, automatic load
balancing
and automatic repairs. Going the way of the file system does not
achieve any of that.


(Apologies for continuing slightly OT thread, but if people google and 
find this thread, I'd like to to contain the below relevant 
suggestion.. :D)


With the caveat that you would have to ensure that your client code 
streams instead of buffering the entire object, you probably want 
something like MogileFS :


http://danga.com/mogilefs/

I have operated a sizable MogileFS cluster for Digg, and it was one of 
the simplest, most comprehensible and least error prone parts of our 
infrastructure. A++ would run again.


--
=Robert Coli
rc...@palominodb.com mailto:rc...@palominodb.com




Re: nodetool hangs and didn't print anything with firewall

2012-02-08 Thread Maxim Potekhin

That's good to hear because it does present a problem for
a strictly manages and firewalled campus environment.

Maxim


On 2/6/2012 11:57 AM, Nick Bailey wrote:

JMX is not very firewall friendly. The problem is that JMX is a two
connection process. The first connection happens on port 7199 and the
second connection happens on some random port  1024. Work on changing
this behavior was started in this ticket:

https://issues.apache.org/jira/browse/CASSANDRA-2967

On Mon, Feb 6, 2012 at 2:02 AM, R. Verlangenro...@us2.nl  wrote:

Do you allow both outbound as inbound traffic? You might also try allowing
both TCP as UDP.


2012/2/6 Roshancodeva...@gmail.com

Yes, If the firewall is disable it works.

--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/nodetool-hangs-and-didn-t-print-anything-with-firewall-tp7257286p7257310.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at
Nabble.com.






Re: Encrypting traffic between Hector client and Cassandra server

2012-01-31 Thread Maxim Potekhin

Hello,

do you see any value in having a web service over cassandra, with actual 
client-clients talking to it via https/ssl?
This way the cluster can be firewalled and therefore protected, plus you 
get decent auth/auth right there.


Maxim


On 1/31/2012 5:21 PM, Xaero S wrote:


I have been trying to figure out how to secure/encrypt the traffic 
between the client (Hector) and the Cassandra Server. I looked at this 
link https://issues.apache.org/jira/browse/THRIFT-106 But since thrift 
sits on a layer after Hector, i am wondering how i can get Hector to 
use the right Thrift calls to have the encryption happen? Also where 
can i get the instructions for the any required setup for encrypting 
the traffic between the Hector client and the Cassandra Server?


Would appreciate any help in this regard. Below are the setup versions

Cassandra Version - 0.8.7
Hector - 0.8.0-2
libthrift jar - 0.6.1


On a side note, we have setup internode encryption on the Cassandra 
server side and found the documentation for that easily.








Re: Restart cassandra every X days?

2012-01-28 Thread Maxim Potekhin

Sorry if this has been covered, I was concentrating solely on 0.8x --
can I just d/l 1.0.x and continue using same data on same cluster?

Maxim


On 1/28/2012 7:53 AM, R. Verlangen wrote:

Ok, seems that it's clear what I should do next ;-)

2012/1/28 aaron morton aa...@thelastpickle.com 
mailto:aa...@thelastpickle.com


There are no blockers to upgrading to 1.0.X.

A
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/01/2012, at 7:48 AM, R. Verlangen wrote:


Ok. Seems that an upgrade might fix these problems. Is Cassandra
1.x.x stable enough to upgrade for, or should we wait for a
couple of weeks?

2012/1/27 Edward Capriolo edlinuxg...@gmail.com
mailto:edlinuxg...@gmail.com

I would not say that issuing restart after x days is a good
idea. You are mostly developing a superstition. You should
find the source of the problem. It could be jmx or thrift
clients not closing connections. We don't restart nodes on a
regiment they work fine.


On Thursday, January 26, 2012, Mike Panchenko m...@mihasya.com
mailto:m...@mihasya.com wrote:
 There are two relevant bugs (that I know of), both resolved
in somewhat recent versions, which make somewhat regular
restarts beneficial
 https://issues.apache.org/jira/browse/CASSANDRA-2868
(memory leak in GCInspector, fixed in 0.7.9/0.8.5)
 https://issues.apache.org/jira/browse/CASSANDRA-2252 (heap
fragmentation due to the way memtables used to be allocated,
refactored in 1.0.0)
 Restarting daily is probably too frequent for either one of
those problems. We usually notice degraded performance in our
ancient cluster after ~2 weeks w/o a restart.
 As Aaron mentioned, if you have plenty of disk space,
there's no reason to worry about cruft sstables. The size
of your active set is what matters, and you can determine if
that's getting too big by watching for iowait (due to reads
from the data partition) and/or paging activity of the java
process. When you hit that problem, the solution is to 1. try
to tune your caches and 2. add more nodes to spread the load.
I'll reiterate - looking at raw disk space usage should not
be your guide for that.
 Forcing a gc generally works, but should not be relied
upon (note suggest in
http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#gc()

http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#gc%28%29).
It's great news that 1.0 uses a better mechanism for
releasing unused sstables.
 nodetool compact triggers a major compaction and is no
longer a recommended by datastax (details here
http://www.datastax.com/docs/1.0/operations/tuning#tuning-compaction
bottom of the page).
 Hope this helps.
 Mike.
 On Wed, Jan 25, 2012 at 5:14 PM, aaron morton
aa...@thelastpickle.com mailto:aa...@thelastpickle.com wrote:

 That disk usage pattern is to be expected in pre 1.0
versions. Disk usage is far less interesting than disk free
space, if it's using 60 GB and there is 200GB thats ok. If
it's using 60Gb and there is 6MB free thats a problem.
 In pre 1.0 the compacted files are deleted on disk by
waiting for the JVM do decide to GC all remaining references.
If there is not enough space (to store the total size of the
files it is about to write or compact) on disk GC is forced
and the files are deleted. Otherwise they will get deleted at
some point in the future.
 In 1.0 files are reference counted and space is freed much
sooner.
 With regard to regular maintenance, node tool cleanup
remvos data from a node that it is no longer a replica for.
This is only of use when you have done a token move.
 I would not recommend a daily restart of the cassandra
process. You will lose all the run time optimizations the JVM
has made (i think the mapped files pages will stay resident).
As well as adding additional entropy to the system which must
be repaired via HH, RR or nodetool repair.
 If you want to see compacted files purged faster the best
approach would be to upgrade to 1.0.
 Hope that helps.
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com http://www.thelastpickle.com/
 On 26/01/2012, at 9:51 AM, R. Verlangen wrote:

 In his message he explains that it's for  Forcing a GC .
GC stands for garbage collection. For some more background
see:
http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)

Problematic deletes in 0.8.8

2012-01-27 Thread Maxim Potekhin

Hello,

after I thought I was out of the woods with data deletion in 0.8.8, I 
unfortunately

see undead data and other strange behavior. Let me clarify:

a) I do run repair and compaction well within GC_GRACE
b) deletes happen daily
c) after a few repairs, when I run an indexed query on the data that I 
tried to delete,
it takes a while, even when the result is 0 rows. This I don't quite 
understand -- I thought that the index itself should

be void of keys that were deleted. What takes so long?
d) finally, even when doing repairs after deletes, I do see the data 
that is not supposed to

be there.

ideas?

Maxim



Re: Restart cassandra every X days?

2012-01-25 Thread Maxim Potekhin
I also do repair, compact and cleanup every couple of days, and also 
have daily restarts on
crontab. It doesn't hurt and I avoid having a node becoming unresponsive 
after many days
of operation, that has happened before. Older files get cleaned up on 
restart.


It doesn't take long to shut down and restart a node,
so if there is enough replication in the cluster it's not any issue.

Maxim


On 1/25/2012 1:13 PM, Karl Hiramoto wrote:

On 01/25/12 16:09, R. Verlangen wrote:

Hi there,

I'm currently running a 2-node cluster for some small projects that 
might need to scale-up in the future: that's why we chose Cassandra. 
The actual problem is that one of the node's harddrive usage keeps 
growing.


For example:
- after a fresh restart ~ 10GB
- after a couple of days running ~ 60GB

I know that Cassandra uses lots of diskspace but is this still 
normal? I'm running cassandra 0.8.7





I run 9 nodes with cassandra 0.7.8   and we see this same behaviour, 
but we keep it under control by doing the sequence:


nodetool repair
nodetool compact
nodetool cleanup

According to the 1.0.x changelog IIRC this disk usage is supposed to 
be improved.



--
Karl




Re: Cassandra x MySQL Sharded - Insert Comparison

2012-01-24 Thread Maxim Potekhin
a) I hate to break it to you, but 6GB x 4 cores != 'high-end machine'. 
It's pretty much middle of the road consumer level these days.


b) Hosting the client and Cassandra on the same node is a Bad Idea. It 
will depend on what exactly the client will do, but in my experience it 
won't work too well in general.


c) Have you considered dual boot, so you can have a good operating 
system (as per Cassandra folks) in addition to Windows?


Maxim


On 1/22/2012 8:22 PM, Gustavo Gustavo wrote:

Ok guys, thank you for the valuable hints you gave me.
For sure, things will perform much better on a real hardware. But my 
object maybe isn't really to see what't the max throughput that the 
datastores have. It is more or less like, given an equal condition, 
which one would perform better.
But I'll do this way, I'm going to use a high-end machine (6GB RAM, 4 
cores) and run Cassandra, MySQL and the Client Test Application on the 
same machine. Unfortunately, I'll have to use Windows 7 as a host to 
the datastores.
From your experience, do you think that even in single node, can 
Cassandra beat in inserts a RDBMS? I've seen that InnoDB (something 
that compares to the other databases relational engine) is pretty 
slow. But when it comes to MyISAM, things are much faster.


/Gustavo

2012/1/22 Chris Gerken chrisger...@mindspring.com 
mailto:chrisger...@mindspring.com


Edward (and Maxim),

I agree.  I was just recalling previous performance bake-offs (for
other technologies, long time ago, galaxy far far away) in which
the customer had put together a mockup of the high throughput
expected in production and wanted to make a decision against that
one set of numbers.  We always found that both/all competing
products could be made to run faster due to unexpected factors in
the non-production test build.  For our side, we always started
simple and built up the throughput until we found a bottleneck.
 We fixed the bottleneck. Rinse and repeat.

Chris Gerken

chrisger...@mindspring.com mailto:chrisger...@mindspring.com
512.587.5261 tel:512.587.5261
http://www.linkedin.com/in/chgerken



On Jan 22, 2012, at 8:51 AM, Edward Capriolo wrote:


In some sense 1 for one performance almost does not matter.
Thou I bet you can get Cassandra better (I remember old school
ycsb white paper benches against a sharded mysql).

One of the main bullet points of Cassandra is if you want to grow
from 4 nodes, to 8 nodes, to 14 nodes, and so on, Cassandra is
elastic and supports online adding and removing of nodes. A
do-it-yourself hash mod this algorithm really has no upgrade path

Edward

On Sun, Jan 22, 2012 at 9:26 AM, Chris Gerken
chrisger...@mindspring.com mailto:chrisger...@mindspring.com
wrote:

Howdy Gustavo,

One thing that jumped out at me is your having put two
cassandra images on the same box.  There may be enough CPU
and memory for the two images combined but you may be seeing
some other resource not being shared so nicely - network card
bandwidth, for example.

More generally, the real question is what the bottleneck is
(for both db's, actually).  Start with Cassandra running in
that configuration and start with one client thread sending
one request a second.  Look at the CPU, network and memory
metrics for all boxes (including the client).  Nothing should
be even close to maxing out that that throughout.  Now
incrementally increase one of the test parameters (number of
clients or number of inserts per second) just a bit (say from
one transaction to 5) and note the above metrics.  Keep
slowly increasing the test parameters, one at a time, until
one of the metrics maxes out.  That's the bottleneck you're
wondering about.  Fix that and the db, be it Cassandra or
MySQL) will move ahead of the other performance-wise.  Turn
your attention to the other db and repeat.

- Chris Gerken

On Jan 22, 2012, at 7:10 AM, Gustavo Gustavo wrote:


Hello,

I've set up a testing evironment for Cassandra and MySQL, to
compare both, regarding *performance only*. And I must admit
that I was expecting Cassandra to beat MySQL. But I've not
seen this happening up to now.
My application/use case is INSERT intensive, since I'm not
updating anything, just inserting all the time.
To compare both I created virtual machines with Ubuntu
11.10, and installed the latest versions of each datastore.
Each VM has 1GB of RAM. I've used VMs as a way to give both
datastores an equal sandbox.
MySQL is set up to work as sharded, with 2 databases, that
means that records are inserted to a specific instance based
on key % 2. The engine is MyISAM (InnoDB was really slow and
not 

Re: Cassandra usage

2012-01-24 Thread Maxim Potekhin

You provide zero information on what you are planning to do with the data.
Thus, your question is impossible to answer.


On 1/24/2012 9:38 PM, francesco.tangari@gmail.com wrote:
Do you think that for a standard project with 50.000.000 of rows on 
2-3 machines cassandra is appropriate

or i should use a normal dbms?

--
francesco.tangari@gmail.com
Inviato con Sparrow http://www.sparrowmailapp.com/?sig





Re: Cassandra x MySQL Sharded - Insert Comparison

2012-01-22 Thread Maxim Potekhin

Hello,
I have some experience in benchmarking Cassandra against Oracle and in 
running on a VM cluster.


While the VM solution will work for many applications, it simply won't 
cut it for all. In particular, I observed a large difference in insert 
performance when I moved from VM to real hardware. Why this is the case, 
can be due to bazillion factors, including the high core count on my 
real machines, and vastly better I/O. The CPU is crucial for inserts 
in Cassandra, and it may not be for RDBMS.


Another factor is the potential bottleneck in the client. There are 
cases when you won't have enough muscle to handle the data in the client 
itself.


None of this is definitive, but I'm just throwing in bit of my 
experience from the past 12 months. Right now I'm able to sink data at 
insane speeds far beyond these of Oracle.


Maxim


On 1/22/2012 8:10 AM, Gustavo Gustavo wrote:

Hello,

I've set up a testing evironment for Cassandra and MySQL, to compare 
both, regarding *performance only*. And I must admit that I was 
expecting Cassandra to beat MySQL. But I've not seen this happening up 
to now.
My application/use case is INSERT intensive, since I'm not updating 
anything, just inserting all the time.
To compare both I created virtual machines with Ubuntu 11.10, and 
installed the latest versions of each datastore. Each VM has 1GB of 
RAM. I've used VMs as a way to give both datastores an equal sandbox.
MySQL is set up to work as sharded, with 2 databases, that means that 
records are inserted to a specific instance based on key % 2. The 
engine is MyISAM (InnoDB was really slow and not really needed to my 
case). There's a primary compound key (integer and datetime columns) 
in this test table.

Let's name the nodes MySQL1 and MySQL2.
Cassandra is set up to work with 4 nodes, with keys (tokens) set up to 
distribute records evenly across the 4 nodes (nodetool ring reports 
25% to each node), replication factor 1 and RandomPartitioner, the 
other configs are left to default. Let's name the nodes Cassandra1, 
Cassandra2, Cassandra3 and Cassandra4.


I'm using 2 physical machines (Windows7) to host the 4 (Cassandra) or 
2 (MySQL) virtual machines, this way:

Machine1: MySQL1, Cassandra1, Cassandra3
Machine2: MySQL2, Cassandra2, Cassandra4
The machines have CPU and RAM enough to host Cassandra Cluster or 
MySQL Cluster at a time.


The client test applicatin is running in a third physical machine, 
with 8 threads doing inserts. The test application is written in C# 
(Windows7) using Aquiles high-level client.


My use case is a vehicle tracking system. So, let's suppose, from 
minute to minute, the vehicle sends its position together with some 
other GPS data and vehicle status information. The columns in my 
Cassandra cluster are just the DateTime (long value) of a position for 
a specific vehicle, and the value is all the other data serialized to 
binary format. Therefore, my CF really grows in columns number. So all 
data is inserted only to one CF/Table named Positions. The key to 
Cassandra is the VehicleID and to MySQL VehicleID + PositionDateTime 
(MySQL creates an index to this automatically). Important to note that 
MySQL threw tons of connection exceptions, even though, the insert was 
retried until it got through MySQL.


My test case was to insert 1k positions for 1k vehicles to 10 days - 
which gives 10.000.000 of inserts.


The final thoughtput that my application had for this scenario was:

Cassandra x 4
2012-01-21 11 tel:2012-01-21%2011:45:38,044 #6 [Logger.Log] 
INFO  -  Inserted 1 positions for 1000 vehicles (1000 inserts):
2012-01-21 11 tel:2012-01-21%2011:45:38,082 #6 [Logger.Log] 
INFO  -  Total Time: 2:37:03,359
2012-01-21 11 tel:2012-01-21%2011:45:38,085 #6 [Logger.Log] 
INFO  -  Throughput: 1061 inserts/s


And for MySQL x 2
2012-01-21 14 tel:2012-01-21%2014:26:25,197 #6 [Logger.Log] 
INFO  -  Inserted 1 positions for 1000 vehicles (1000 inserts):
2012-01-21 14 tel:2012-01-21%2014:26:25,250 #6 [Logger.Log] 
INFO  -  Total Time: 2:06:25,914
2012-01-21 14 tel:2012-01-21%2014:26:25,263 #6 [Logger.Log] 
INFO  -  Throughput: 1318 inserts/s


Is there something that I'm missing here? Is this excepted? Or the 
problem is somewhere else and that's hard to say looking at this 
description?


Cheers,
Gustavo





Re: delay in data deleting in cassadra

2012-01-20 Thread Maxim Potekhin

Did you run repairs withing GC_GRACE all the time?



On 1/20/2012 3:42 AM, Shammi Jayasinghe wrote:

Hi,
  I am experiencing a delay in delete operations in cassandra. Its as 
follows. I am running a thread which contains following three steps.


Step 01: Read data from column family foo[1]
Step 02: Process received data eg: bar1,bar2,bar3,bar4,bar5
Step 03: Remove those processed data from foo.[2]

 The problem occurs when this thread is invoked for the second time.
In that step , it returns some of data that i already deleted in the 
third step of the previous cycle.


Eg: it returns bar2,bar3,bar4,bar5

It seems though i called the remove operation as follows [2], it takes 
time to replicate it to
the file system. If i make a thread sleep of 5 secs between the thread 
cycles, it does not

give me any data that i deleted in the third step.

 [1] . SliceQueryString, String, byte[] sliceQuery =
HFactory.createSliceQuery(keyspace, 
stringSerializer, stringSerializer, bs);

sliceQuery.setKey(queueName);
sliceQuery.setRange(, , false, messageCount);
sliceQuery.setColumnFamily(USER_QUEUES_COLUMN_FAMILY);

 [2].  MutatorString mutator = HFactory.createMutator(keyspace, 
stringSerializer);
mutator.addDeletion(queueName, USER_QUEUES_COLUMN_FAMILY, 
messageId, stringSerializer);

mutator.execute();


Is there a solution for this.

Cassadra version : 0.8.0
Libthrift version : 0.6.1


Thanks
Shammi
--
Best Regards,*

Shammi Jayasinghe*
Senior Software Engineer; WSO2, Inc.; http://wso2.com http://wso2.com/,
mobile: +94 71 4493085






Re: Cassandra to Oracle?

2012-01-20 Thread Maxim Potekhin

What makes you think that RDBMS will give you acceptable performance?

I guess you will try to index it to death (because otherwise the ad 
hoc queries won't work well if at all), and at this point you may be 
hit with a performance penalty.


It may be a good idea to interview users and build denormalized views in 
Cassandra, maybe on a separate look-up cluster. A few percent of users 
will be unhappy, but you'll find it hard to do better. I'm talking from 
my experience with an industrial strength RDBMS which doesn't scale very 
well for what you call ad-hoc queries.


Regards,
Maxim




On 1/20/2012 9:28 AM, Brian O'Neill wrote:


I can't remember if I asked this question before, but

We're using Cassandra as our transactional system, and building up 
quite a library of map/reduce jobs that perform data quality analysis, 
statistics, etc.

( 100 jobs now)

But... we are still struggling to provide an ad-hoc query mechanism 
for our users.


To fill that gap, I believe we still need to materialize our data in 
an RDBMS.


Anyone have any ideas?  Better ways to support ad-hoc queries?

Effectively, our users want to be able to select count(distinct Y) 
from X group by Z.

Where Y and Z are arbitrary columns of rows in X.

We believe we can create column families with different key structures 
(using Y an Z as row keys), but some column names we don't know / 
can't predict ahead of time.


Are people doing bulk exports?
Anyone trying to keep an RDBMS in synch in real-time?

-brian

--
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/





Re: Cassandra to Oracle?

2012-01-20 Thread Maxim Potekhin

I certainly agree with difficult to predict. There is a Danish
proverb, which goes it's difficult to make predictions, especially
about the future.

My point was that it's equally difficult with noSQL and RDBMS.
The latter requires indexing to operate well, and that's a potential
performance problem.

On 1/20/2012 7:55 PM, Mohit Anchlia wrote:

I think the problem stems when you have data in a column that you need
to run adhoc query on which is not denormalized. In most cases it's
difficult to predict the type of query that would be required.

Another way of solving this could be to index the fields in search engine.

On Fri, Jan 20, 2012 at 7:37 PM, Maxim Potekhinpotek...@bnl.gov  wrote:

What makes you think that RDBMS will give you acceptable performance?

I guess you will try to index it to death (because otherwise the ad hoc
queries won't work well if at all), and at this point you may be hit with a
performance penalty.

It may be a good idea to interview users and build denormalized views in
Cassandra, maybe on a separate look-up cluster. A few percent of users
will be unhappy, but you'll find it hard to do better. I'm talking from my
experience with an industrial strength RDBMS which doesn't scale very well
for what you call ad-hoc queries.

Regards,
Maxim





On 1/20/2012 9:28 AM, Brian O'Neill wrote:


I can't remember if I asked this question before, but

We're using Cassandra as our transactional system, and building up quite a
library of map/reduce jobs that perform data quality analysis, statistics,
etc.
(  100 jobs now)

But... we are still struggling to provide an ad-hoc query mechanism for
our users.

To fill that gap, I believe we still need to materialize our data in an
RDBMS.

Anyone have any ideas?  Better ways to support ad-hoc queries?

Effectively, our users want to be able to select count(distinct Y) from X
group by Z.
Where Y and Z are arbitrary columns of rows in X.

We believe we can create column families with different key structures
(using Y an Z as row keys), but some column names we don't know / can't
predict ahead of time.

Are people doing bulk exports?
Anyone trying to keep an RDBMS in synch in real-time?

-brian

--
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/





Re: ideal cluster size

2012-01-20 Thread Maxim Potekhin

You can also scale not horizontally but diagonally,
i.e. raid SSDs and have multicore CPUs. This means that
you'll have same performance with less nodes, making
it far easier to manage.

SSDs by themselves will give you an order of magnitude
improvement on I/O.


On 1/19/2012 9:17 PM, Thorsten von Eicken wrote:

We're embarking on a project where we estimate we will need on the order
of 100 cassandra nodes. The data set is perfectly partitionable, meaning
we have no queries that need to have access to all the data at once. We
expect to run with RF=2 or =3. Is there some notion of ideal cluster
size? Or perhaps asked differently, would it be easier to run one large
cluster or would it be easier to run a bunch of, say, 16 node clusters?
Everything we've done to date has fit into 4-5 node clusters.




Re: Using 5-6 bytes for cassandra timestamps vs 8…

2012-01-18 Thread Maxim Potekhin

I must have accidentally deleted all messages in this thread save this one.

On the face value, we are talking about saving 2 bytes per column. I 
know it can add up with many columns, but relative to the size of the 
column -- is it THAT significant?


I made an effort to minimize my CF footprint by replacing the natural 
column keys with integers (and translating back and forth when writing 
and reading). It's easy to see that in my case I achieve almost 50% 
storage savings and at least 30%. But if the column in question contains 
more than 20 bytes -- what's up with trying to save 2?


Cheers

Maxim


On 1/18/2012 11:49 PM, Ertio Lew wrote:

I believe the timestamps *on per column basis* are only required until
the compaction time after that it may also work if the timestamp range
could be specified globally on per SST table basis. and thus the
timestamps until compaction are only required to be measure the time
from the initialization of the new memtable to the point the column is
written to that memtable. Thus you can easily fit that time in 4
bytes. This I believe would save atleast  4 bytes overhead for each
column.

Is anything related to these overheads under consideration/ or planned
in the roadmap ?



On Tue, Sep 6, 2011 at 11:44 AM, Oleg Anastastasyevolega...@gmail.com  wrote:

I have a patch for trunk which I just have to get time to test a bit before I

submit.

It is for super columns and will use the super columns timestamp as the base

and only store variant encoded offsets in the underlying columns.
Could you please measure how much real benefit it brings (in real RAM
consumption by JVM). It is hard to tell will it give noticeable results or not.
AFAIK memory structures used for memtable consume much more memory. And 64-bit
JVM allocates memory aligned to 64-bit word boundary. So 37% of memory
consumption reduction looks doubtful.






Re: About initial token, autobootstraping and load balance

2012-01-15 Thread Maxim Potekhin
I see. Sure, that's a bit more complicated and you'd have to move tokens 
after adding a machine.


Maxim


On 1/15/2012 4:40 AM, ???  wrote:
It's nothing wrong for 3 nodes. It's a problem for cluster of 20+ 
nodes, growing.


2012/1/14 Maxim Potekhin potek...@bnl.gov mailto:potek...@bnl.gov

I'm just wondering -- what's wrong with manual specification of
tokens? I'm so glad I did it and have not had problems with
balancing and all.

Before I was indeed stuck with 25/25/50 setup in a 3 machine
cluster, when had to move tokens to make it 33/33/33 and I screwed
up a little in that the first one did not start with 0, which is
not a good idea.

Maxim



--
Best regards,
 Vitalii Tymchyshyn




Re: About initial token, autobootstraping and load balance

2012-01-14 Thread Maxim Potekhin
I'm just wondering -- what's wrong with manual specification of tokens? 
I'm so glad I did it and have not had problems with balancing and all.


Before I was indeed stuck with 25/25/50 setup in a 3 machine cluster, 
when had to move tokens to make it 33/33/33 and I screwed up a little in 
that the first one did not start with 0, which is not a good idea.


Maxim

On 1/13/2012 2:10 PM, David McNelis wrote:

The documentation for that section needs to be updated...

What happens is that if you just autobootstrap without setting a token 
it will by default bisect the range of the largest node.


So if you go through several iterations of adding nodes, then this is 
what you would see:


Gen 1:
Node A:  100% of tokens, token range 1-10 (for example)

Gen 2:
Node A: 50% of tokens  (1-5)
Node B: 50% of tokens (6-10)

Gen 3:
Node A: 25% of tokens (1-2.5)
Node B: 50% of tokens (6-10)
Node C: 25% of tokens (2.6-5)

In reality, what you'd want in gen 3 is every node to be 33%, but it 
would not be the case without setting the tokens to begin with.


You'll notice that there are a couple of scripts available to generate 
a list of  initial tokens for your particular cluster size, then ever 
time you add a node you'll need to update all the nodes with new 
tokens in order to properly load balance.


Does this make sense?

Other folks, am I explaining this correctly?

David

2012/1/13 Carlos Pérez Miguel cperez...@gmail.com 
mailto:cperez...@gmail.com


Hello,

I have a doubt about how initial token is determined. In Cassandra's
documentation it is said that it is better to manually configure the
initial token to each node in the system but also is said that if
initial token is not defined and autobootstrap is true, new nodes
choose initial token in order to better the load balance of the
cluster. But what happens if no initial token is chosen and
autobootstrap is not activated? How each node selects its initial
token to balance the ring?

I ask this because I am making tests with a 20 nodes cassandra cluster
with cassandra 0.7.9. Any node has initial token, nor
autobootstraping. I restart the cluster with each test I want to make
and in the end the cluster is always well balanced.

Thanks

Carlos Pérez Miguel






Exception thrown during repair, contains jmx classes -- why?

2012-01-11 Thread Maxim Potekhin
As per below trace, there is jmx.mbeanserber involved. What I ran was a 
common repair.

Is that right? What does this failure indicate?

at 
org.apache.cassandra.service.StorageService.forceTableRepair(StorageService.java:1613)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
at 
com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
at 
com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
at 
com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)
at 
sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)

at sun.rmi.transport.Transport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)





Re: Should I throttle deletes?

2012-01-10 Thread Maxim Potekhin

Thanks, this makes sense. I'll try that.

Maxim

On 1/6/2012 10:51 AM, Vitalii Tymchyshyn wrote:
Do you mean on writes? Yes, your timeouts must be so that your write 
batch could complete until timeout elapsed. But this will lower write 
load, so reads should not timeout.


Best regards, Vitalii Tymchyshym

06.01.12 17:37, Philippe написав(ла):


But you will then get timeouts.

Le 6 janv. 2012 15:17, Vitalii Tymchyshyn tiv...@gmail.com 
mailto:tiv...@gmail.com a écrit :


05.01.12 22:29, Philippe написав(ла):


Then I do have a question, what do people generally use as
the batch size?

I used to do batches from 500 to 2000 like you do.
After investigating issues such as the one you've encountered
I've moved to batches of 20 for writes and 256 for reads.
Everything is a lot smoother : no more timeouts.


I'd better reduce mutation thread pool with concurrent_writes
setting. This will lower server load no matter, how many clients
are sending batches, at the same time you still have good batching.

Best regards, Vitalii Tymchyshyn







Re: How does Cassandra decide when to do a minor compaction?

2012-01-07 Thread Maxim Potekhin

Hello Alexandru,

I just want to have a feel what activity to expect on the cluster.
The load from minor compactions is not overwhelming but it
seems non-negligible.

Maxim


On 1/7/2012 5:12 AM, Alexandru Sicoe wrote:

Hi Maxim,
 Why do you need to know this?

Cheers,
Alex

On Sat, Jan 7, 2012 at 10:03 AM, aaron morton aa...@thelastpickle.com 
mailto:aa...@thelastpickle.com wrote:



http://www.datastax.com/docs/1.0/operations/tuning#tuning-compaction


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 7/01/2012, at 3:17 PM, Maxim Potekhin wrote:


The subject says it all -- pointers appreciated.

Thanks

Maxim








How to find out when a nodetool operation has ended?

2012-01-06 Thread Maxim Potekhin

Suppose I start a repair on one or a few nodes in my cluster,
from an interactive machine in the office, and leave for the day
(which is a very realistic scenario imho).

Is there a way to know, from a remote machine, when a particular
action, such as compaction or repair, has been finished?

I figured that compaction stats can be mum at times, thus
it's not a reliable indicator.

Many thanks,

Maxim



Re: How to find out when a nodetool operation has ended?

2012-01-06 Thread Maxim Potekhin

Thanks, so I take it there is no solution outside of Opcenter.

I mean of course I can redirect the output, with additional timestamps 
if needed,
to a log file -- which I can access remotely. I just thought there would 
be some status

command by chance, to tell me what maintenance the node is doing. Too bad
there is not!

Maxim


On 1/6/2012 5:40 PM, R. Verlangen wrote:

You might consider:
- installing DataStax OpsCenter ( 
http://www.datastax.com/products/opscenter )
- starting the repair in a linux screen (so you can attach to the 
screen from another location)






How does Cassandra decide when to do a minor compaction?

2012-01-06 Thread Maxim Potekhin

The subject says it all -- pointers appreciated.

Thanks

Maxim



Re: Should I throttle deletes?

2012-01-05 Thread Maxim Potekhin

Hello Aaron,

On 1/5/2012 4:25 AM, aaron morton wrote:

I use a batch mutator in Pycassa to delete ~1M rows based on
a longish list of keys I'm extracting from an auxiliary CF (with no
problem of any sort).

What is the size of the deletion batches ?


2000 mutations.





Now, it appears that such heads-on delete puts a temporary
but large load on the cluster. I have SSD's and they go to 100%
utilization, and the CPU spikes to significant loads.

Does the load spike during the deletion or after it ?


During.



Do any of the thread pool back up in nodetool tpstats during the load ?


Haven't checked, thank you for the lead.


I can think of a few general issues you may want to avoid:

* Each row in a batch mutation is handled by a task in a thread pool 
on the nodes. So if you send a batch to delete 1,000 rows it will put 
1,000 tasks in the Mutation stage. This will reduce the query throughput.


Aah. I didn't know that. I was under the impression that batching saves 
the communication overhead, and that's it.


Then I do have a question, what do people generally use as the batch size?

Thanks

Maxim




Re: Should I throttle deletes?

2012-01-05 Thread Maxim Potekhin
Thanks, that's quite helpful. I'm wondering though if multiplying the 
number of clients will

end up doing same thing.

On 1/5/2012 3:29 PM, Philippe wrote:


Then I do have a question, what do people generally use as the
batch size?

I used to do batches from 500 to 2000 like you do.
After investigating issues such as the one you've encountered I've 
moved to batches of 20 for writes and 256 for reads. Everything is a 
lot smoother : no more timeouts.


The downside though is that I have to run more client threads in 
parallele to maximize throughput.


Cheers




Re: Strange OOM when doing list in CLI

2012-01-04 Thread Maxim Potekhin

Ed,

thanks for a dose of common sense, I should have thunk about it.

In fact, I only have 2 columns in that one particular CF, but one of 
these can get really fat (for a good reason). So the CLI just plain runs 
out of memory when pulling the default 100 rows (with a little help from 
various overheads). It didn't happen before because the recent additions 
to the data were slightly fatter than in the beginning.


Thanks

Maxim



On 1/3/2012 10:27 PM, Edward Capriolo wrote:
What you are probably running into is that list from the cli can bring 
all the columns of a key into memory. I have counters using composite 
keys and about 1k columns causes this to happen. We should have some 
paging support with list.


On Tuesday, January 3, 2012, Maxim Potekhin potek...@bnl.gov 
mailto:potek...@bnl.gov wrote:
 I came back from Xmas vacation only to see that what always was an 
innocuous procedure

 in CLI now reliably results in OOM -- does anyone have ideas why?

 It never happened before. Version of Cassandra is 0.8.8.

  2956 java -ea 
-javaagent:/home/cassandra/cassandra/bin/../lib/jamm-0.2.2.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 Xms8000M 
-Xmx8000M -Xmn2000M -XX:+HeapDumpOnOutOfMemoryError -Xss128k 
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemakEnabled 
-XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
-XX:CMSInitiatingOccupancyFraction=75 
-XX:+UseCMSInitiatingOccupancyOnly -X:+PrintGCDetails 
-XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps 
-Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=199



 [default@PANDA] list idxR;
 Using default limit of 100
 Exception in thread main java.lang.OutOfMemoryError: Java heap space
at 
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:140)
at 
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at 
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at 
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at 
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at 
org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:752)
at 
org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:734)
at 
org.apache.cassandra.cli.CliClient.executeList(CliClient.java:1379)
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:266)
at 
org.apache.cassandra.cli.CliMain.processStatement(CliMain.java:217)

at org.apache.cassandra.cli.CliMain.main(CliMain.java:345)


 




Should I throttle deletes?

2012-01-04 Thread Maxim Potekhin

Now that my cluster appears to run smoothly and after a few successful
repairs and compacts, I'm back in the business of deletion of portions
of data based on its date of insertion. For reasons too lengthy to be
explained here, I don't want to use TTL.

I use a batch mutator in Pycassa to delete ~1M rows based on
a longish list of keys I'm extracting from an auxiliary CF (with no
problem of any sort).

Now, it appears that such heads-on delete puts a temporary
but large load on the cluster. I have SSD's and they go to 100%
utilization, and the CPU spikes to significant loads.

Does anyone do throttling on such mass-delete procedure?

Thanks in advance,

Maxim



Re: Cassandra WebUI with Sources released

2012-01-03 Thread Maxim Potekhin

Congrats on what seems to be a nice piece of work, need to check it out.
Nicely complements other tools.

Maxim


On 1/2/2012 12:48 PM, Markus Wiesenbacher | Codefreun.de wrote:


Hi,

I wish you all a happy and healthy new year!

As you may remember, I coded a little GUI for Apache Cassandra. Now I 
did set up a little project homepage where you can download it, 
including the sources:


http://www.codefreun.de

http://www.codefreunde.com

Markus ;)





Strange OOM when doing list in CLI

2012-01-03 Thread Maxim Potekhin
I came back from Xmas vacation only to see that what always was an 
innocuous procedure

in CLI now reliably results in OOM -- does anyone have ideas why?

It never happened before. Version of Cassandra is 0.8.8.

 2956 java -ea 
-javaagent:/home/cassandra/cassandra/bin/../lib/jamm-0.2.2.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 Xms8000M -Xmx8000M 
-Xmn2000M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC 
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemakEnabled -XX:SurvivorRatio=8 
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 
-XX:+UseCMSInitiatingOccupancyOnly -X:+PrintGCDetails 
-XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps 
-Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=199



[default@PANDA] list idxR;
Using default limit of 100
Exception in thread main java.lang.OutOfMemoryError: Java heap space
at 
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:140)
at 
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at 
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at 
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at 
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at 
org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:752)
at 
org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:734)
at 
org.apache.cassandra.cli.CliClient.executeList(CliClient.java:1379)
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:266)
at 
org.apache.cassandra.cli.CliMain.processStatement(CliMain.java:217)

at org.apache.cassandra.cli.CliMain.main(CliMain.java:345)




Re: Doubts related to composite type column names/values

2011-12-20 Thread Maxim Potekhin

With regards to static, what are major benefits as it compares with
string catenation (with some convenient separator inserted)?

Thanks

Maxim


On 12/20/2011 1:39 PM, Richard Low wrote:

On Tue, Dec 20, 2011 at 5:28 PM, Ertio Lewertio...@gmail.com  wrote:

With regard to the composite columns stuff in Cassandra, I have the
following doubts :

1. What is the storage overhead of the composite type column names/values,

The values are the same.  For each dimension, there is 3 bytes overhead.


2. what exactly is the difference between the DynamicComposite and Static
Composite ?

Static composite type has the types of each dimension specified in the
column family definition, so all names within that column family have
the same type.  Dynamic composite type lets you specify the type for
each column, so they can be different.  There is extra storage
overhead for this and care must be taken to ensure all column names
remain comparable.





Re: Doubts related to composite type column names/values

2011-12-20 Thread Maxim Potekhin
Thank you Aaron! As long as I have plain strings, would you say that I 
would do almost as well with catenation?


Of course I realize that mixed types are a very different case where the 
composite is very useful.


Thanks

Maxim


On 12/20/2011 2:44 PM, aaron morton wrote:
Component values are compared in a type aware fashion, an Integer is 
an Integer. Not a 10 character zero padded string.


You can also slice on the components. Just like with string concat, 
but nicer.  . e.g. If you app is storing comments for a thing, and the 
column names have the form comment_id, field or Integer, String 
you can slice for all properties of a comment or all properties for 
comments between two comment_id's


Finally, the client library knows what's going on.

Hope that helps.

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 21/12/2011, at 7:43 AM, Maxim Potekhin wrote:


With regards to static, what are major benefits as it compares with
string catenation (with some convenient separator inserted)?

Thanks

Maxim


On 12/20/2011 1:39 PM, Richard Low wrote:
On Tue, Dec 20, 2011 at 5:28 PM, Ertio Lewertio...@gmail.com 
mailto:ertio...@gmail.com  wrote:

With regard to the composite columns stuff in Cassandra, I have the
following doubts :

1. What is the storage overhead of the composite type column 
names/values,

The values are the same.  For each dimension, there is 3 bytes overhead.

2. what exactly is the difference between the DynamicComposite and 
Static

Composite ?

Static composite type has the types of each dimension specified in the
column family definition, so all names within that column family have
the same type.  Dynamic composite type lets you specify the type for
each column, so they can be different.  There is extra storage
overhead for this and care must be taken to ensure all column names
remain comparable.









Can I slice on composite indexes?

2011-12-20 Thread Maxim Potekhin

Let's say I have rows with composite columns Like

(key1, {('xyz', 'abc'): 'colval1'},  {('xyz', 'def'): 'colval2'})
(key2, {('ble', 'meh'): 'otherval'})

Is it possible to create a composite type index such that I can query on 
'xyz'

and get the first two columns?

Thanks

Maxim



Re: commit log size

2011-12-14 Thread Maxim Potekhin

Alexandru, Jeremiah --

what setting needs to be tweaked, and what's the recommended value?

I observed similar behavior this morning.

Maxim


On 11/28/2011 2:53 PM, Jeremiah Jordan wrote:
Yes, the low volume memtables are causing the problem.  Lower the 
thresholds for those tables if you don't want the commit logs to go 
crazy.


-Jeremiah

On 11/28/2011 11:11 AM, Alexandru Dan Sicoe wrote:

Hello everyone,

4 node Cassandra 0.8.5 cluster with RF=2, replica placement strategy 
= SimpleStartegy, write consistency level = ANY, 
memtable_flush_after_mins =1440; memtable_operations_in_millions=0.1; 
memtable_throughput_in_mb = 40; max_compaction_threshold =32; 
min_compaction_threshold =4;


I have one keyspace with 1 CF for all the data and 3 other small CFs 
for metadata. I am using Datastax OpsCenter to monitor my cluster so 
there is another keyspace for monitoring.


Everything works ok, the only thing I've noticed is this morning the 
commitlog of one node was 52GB, one was 25 GB and the others were 
around 3 GB. I left everything untouched and looked a couple of hours 
later and the 52GB one is now about 3GB and the 25 GB one is now 29 
GB and the other two about the same as before.


Are my commit logs growing because of small memtables which don't get 
flushed because they don't reach the operations and throughput 
limits? Then why do only some nodes exhibit this behaviour?


It would be interesting to understand how to control the size of the 
commitlog also to know how to size my commitlog disks!


Thanks,
Alex




Re: Keys for deleted rows visible in CLI

2011-12-14 Thread Maxim Potekhin
Thanks, it makes perfect sense now. Well an option in cassandra could 
make it optional
as far as display it concerned, w/o performance hit -- of course this is 
all unimportant.


Thanks again

Maxim


On 12/14/2011 11:30 AM, Brandon Williams wrote:

http://wiki.apache.org/cassandra/FAQ#range_ghosts

On Wed, Dec 14, 2011 at 4:36 AM, Radim Kolarh...@sendmail.cz  wrote:

Dne 14.12.2011 1:15, Maxim Potekhin napsal(a):


Thanks. It could be hidden from a human operator, I suppose :)

I agree. Open JIRA for it.




Asymmetric load

2011-12-14 Thread Maxim Potekhin

What could be the reason I see unequal loads on a 3-node cluster?
This all started happening during repairs (which again are not going 
smoothly).


Maxim



Crazy compactionstats

2011-12-14 Thread Maxim Potekhin

Hello

I ran repair like this:

nohup repair.sh 

where repair.sh contains simply nodetool repair plus timestamp.

The process dies while dumping this:
Exception in thread main java.io.IOException: Repair command #1: some 
repair session(s) failed (see log for details).
at 
org.apache.cassandra.service.StorageService.forceTableRepair(StorageService.java:1613)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
at 
com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
at 
com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
at 
com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)
at 
sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303)

at sun.rmi.transport.Transport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)


I still see pending tasks in nodetool compactionstats, and their number 
goes into hundreds which I haven't seen before.

What's going on?

Thanks

Maxim



Best way to implement indexing for high-cardinality values?

2011-12-14 Thread Maxim Potekhin

I now have a CF with extremely skinny rows (in the current implementation),
and the application will want to query by more than one column values.
Problem is that the values in a lot of cases will be high cardinality.
One other factor is that I want to rotate data in and our of the system
in one day buckets -- LILO in effect. The date will be on of the columns
as well.

I had 9 indexes in mind, but I think I can pare it down to 5. At least 
one of the
column I will need to query by, has values that are guaranteed to be 
unique --
there are effectively two ways to identify data for very different part 
of the

complete system. Indexing on that would be bad, wouldn't it?

Any advice would be appreciated.

Thanks

Maxim



show schema bombs in 0.8.6

2011-12-13 Thread Maxim Potekhin

Running cli --debug:

[default@PANDA] show schema;
null
java.lang.RuntimeException
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:310)
at 
org.apache.cassandra.cli.CliMain.processStatement(CliMain.java:217)

at org.apache.cassandra.cli.CliMain.main(CliMain.java:345)
Caused by: java.lang.NullPointerException
at 
org.apache.cassandra.cli.CliClient.showColumnMeta(CliClient.java:1716)
at 
org.apache.cassandra.cli.CliClient.showColumnFamily(CliClient.java:1686)
at 
org.apache.cassandra.cli.CliClient.showKeyspace(CliClient.java:1636)
at 
org.apache.cassandra.cli.CliClient.executeShowSchema(CliClient.java:1598)
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:250)




Keys for deleted rows visible in CLI

2011-12-13 Thread Maxim Potekhin

Hello,

I searched the archives and it appears that this question was once asked but
was not answered. I just deleted a lot of rows, and want to list in 
cli. I still see
the keys. This is not the same as getting slices, is it? Anyhow, what's 
the reason

and rationale? I run 0.8.8.

Thanks

Maxim



Re: Keys for deleted rows visible in CLI

2011-12-13 Thread Maxim Potekhin

Thanks. It could be hidden from a human operator, I suppose :)


On 12/13/2011 7:12 PM, Harold Nguyen wrote:

Hi  Maxim,

The reason for this is because if node 1 goes down while you deleted 
information on node 2,  node 1 will know not to repair the data when it comes 
back again. It will know that an operation has been performed to delete the 
data.

Harold

-Original Message-
From: Maxim Potekhin [mailto:potek...@bnl.gov]
Sent: Tuesday, December 13, 2011 4:03 PM
To: user@cassandra.apache.org
Subject: Keys for deleted rows visible in CLI

Hello,

I searched the archives and it appears that this question was once asked but was not 
answered. I just deleted a lot of rows, and want to list in cli. I still see 
the keys. This is not the same as getting slices, is it? Anyhow, what's the reason and 
rationale? I run 0.8.8.

Thanks

Maxim


'Like' us on Facebook for exclusive content and other resources on all 
Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook






Deleted rows re-appearing on repair in 0,8.6

2011-12-12 Thread Maxim Potekhin

Hello,

I know that this problem used to exist in 0.8.1 --
I delete rows, run a repair and these rows are back with
a vengeance. I recall I was told that this was fixed in 0.8.6 --
is that the case? I still keep seeing that behavior.

Thanks
Maxim



Really old files in the data directory

2011-12-09 Thread Maxim Potekhin

Hello,

a varied the GC grace a few times over the period of my cluster's 
lifetime, but I never went above
10 days. I did compactions, repairs etc. Now, I see that some files in 
the data directories of the nodes
that were there from day one carry timestamps back from July.  There are 
files containing secondary
indexes. But I have deleted a large portion of the data, one would 
expect that these files must have

been rebuilt one or many times. What's happening?

I run 0.8.6.

Thanks

Maxim



Re: Cassandra 0.8.8

2011-12-09 Thread Maxim Potekhin

Hello everyone,

so what's the update on 0.8.8?

Many thanks

Maxim


On 12/2/2011 4:49 AM, Patrik Modesto wrote:

Hi,

It's been almost 2 months since the release of the 0.8.7 version and
there are quite some changes in 0.8.8, so I'd like to ask is there a
release date?

Regards,
Patrik




forceUserDefinedCompaction -- how to use it?

2011-12-07 Thread Maxim Potekhin

Can anyone provide an example of how to use forceUserDefinedCompaction?

Thanks

Maxim





Re: exporting data from Cassandra cluster

2011-12-07 Thread Maxim Potekhin

Hello Alexandru,

as you probably know, my group is using Amazon S3 to permanently (or 
sem-permanently) park the data
in CSV format, which makes it portable and we can load it into anything 
if needed, or analyze on its own.

Just my half of a Swiss centime :)

And, because the S3 option is not for everybody, and since you are at 
CERN, -- talk to data people in ATLAS.

350GB seems trivial.

Regards

Maxim



On 12/7/2011 11:17 AM, Alexandru Dan Sicoe wrote:

Hello everyone.
 3 node Cassandra 0.8.5 cluster. I've left the system running in 
production environment for long term testing. I've accumulated about 
350GB of data with RF=2. The machines I used for the tests are older 
and need to be replaced. Because of this I need to export the data to 
a permanent location. How should I export the data? In order to reduce 
the storage spac I want to export only the non-replicated data? I 
mean, just one copy of the data (without the replicas). Is this 
possible? How?


Cheers,
Alexandru





Cassandra behavior too fragile?

2011-12-07 Thread Maxim Potekhin
OK, thanks to the excellent help of Datastax folks, some of the more 
severe inconsistencies in my Cassandra cluster were fixed (after a node 
was down and compactions failed etc).


I'm still having problems as reported in repairs 0.8.6. thread.

Thing is, why is it so easy for the repair process to break? OK, I admit 
I'm not sure why nodes are reported as dead once in a while, but it's 
absolutely certain that they simply don't fall off the edge, are knocked 
out for 10 min or anything like that. Why is there no built-in 
tolerance/retry mechanism so that a node that may seem silent for a 
minute can be contacted later, or, better yet, a different node with a 
relevant replica is contacted?


As was evident from some presentations at Cassandra-NYC yesterday, 
failed compactions and repairs are a major problem for a number of 
users. The cluster can quickly become unusable. I think it would be a 
good idea to build more robustness into these procedures,


Regards

Maxim



Re: Repair failure under 0.8.6

2011-12-05 Thread Maxim Potekhin


Basically I tweaked the phi, put in more verbose GC reporting and 
decided to do a compaction before I proceed. I'm getting this on the
node where compaction is being run. And the system log for the other two 
nodes follows. It's obvious that the cluster is sick, but I
can't determine why -- there are no overwhelming GC evidence as far as I 
can see. I didn't start compaction on node #3, somehow

it attempts to do it anyhow.

===
Node #2 (compaction is being run):

 INFO [CompactionExecutor:2] 2011-12-05 14:19:36,741 
CompactionManager.java (line 608) Compacted to 
/data/cassandra_data/data/system/LocationInfo-tmp-g-72-Data.db.  967 to 
561 (~58% of

original) bytes for 4 keys.  Time: 71ms.
 INFO [main] 2011-12-05 14:19:36,941 Mx4jTool.java (line 67) mx4j 
successfuly loaded
 INFO [GossipStage:1] 2011-12-05 14:19:36,943 Gossiper.java (line 715) 
Node /130.199.185.193 has restarted, now UP again
 INFO [GossipStage:1] 2011-12-05 14:19:36,943 Gossiper.java (line 683) 
InetAddress /130.199.185.193 is now UP
 INFO [GossipStage:1] 2011-12-05 14:19:36,971 StorageService.java (line 
819) Node /130.199.185.193 state jump to normal
 INFO [GossipStage:1] 2011-12-05 14:19:36,971 Gossiper.java (line 715) 
Node /130.199.185.195 has restarted, now UP again
 INFO [GossipStage:1] 2011-12-05 14:19:36,971 Gossiper.java (line 683) 
InetAddress /130.199.185.195 is now UP
 INFO [GossipStage:1] 2011-12-05 14:19:36,974 StorageService.java (line 
819) Node /130.199.185.195 state jump to normal
 INFO [main] 2011-12-05 14:19:37,003 CassandraDaemon.java (line 115) 
Binding thrift service to cassandra02.usatlas.bnl.gov/130.199.185.194:9160
 INFO [main] 2011-12-05 14:19:37,016 CassandraDaemon.java (line 124) 
Using TFastFramedTransport with a max frame size of 15728640 bytes.
 INFO [main] 2011-12-05 14:19:37,018 CassandraDaemon.java (line 151) 
Using synchronous/threadpool thrift server on 
cassandra02.usatlas.bnl.gov/130.199.185.194 : 9160
 INFO [Thread-6] 2011-12-05 14:19:37,019 CassandraDaemon.java (line 
203) Listening for thrift clients...
 INFO [GossipTasks:1] 2011-12-05 14:19:50,601 Gossiper.java (line 697) 
InetAddress /130.199.185.195 is now dead.
ERROR [HintedHandoff:1] 2011-12-05 14:20:37,954 
AbstractCassandraDaemon.java (line 139) Fatal exception in thread 
Thread[HintedHandoff:1,1,main]
java.lang.RuntimeException: java.lang.RuntimeException: Could not reach 
schema agreement with /130.199.185.193 in 6ms
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.RuntimeException: Could not reach schema agreement 
with /130.199.185.193 in 6ms
at 
org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:293)
at 
org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:304)
at 
org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:89)
at 
org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:397)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)

... 3 more
ERROR [HintedHandoff:1] 2011-12-05 14:20:37,956 
AbstractCassandraDaemon.java (line 139) Fatal exception in thread 
Thread[HintedHandoff:1,1,main]
java.lang.RuntimeException: java.lang.RuntimeException: Could not reach 
schema agreement with /130.199.185.193 in 6ms
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.RuntimeException: Could not reach schema agreement 
with /130.199.185.193 in 6ms
at 
org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:293)
at 
org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:304)
at 
org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:89)
at 
org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:397)


=
Node #1 (nothing run)


 INFO [main] 2011-12-05 14:16:15,779 CassandraDaemon.java (line 115) 
Binding thrift service to cassandra01.usatlas.bnl.gov/130.199.185.193:9160
 INFO [main] 2011-12-05 14:16:15,782 CassandraDaemon.java (line 124) 
Using TFastFramedTransport with a max frame size of 15728640 bytes.
 INFO [main] 2011-12-05 14:16:15,784 CassandraDaemon.java (line 

Could not reach schema agreement... 0.8.6

2011-12-05 Thread Maxim Potekhin

Hello,

upon startup, in my cluster of 3 machines, I see similar messages in 
system.log
on each node (below). I start nodes one by one, after I ascertain the 
previous one
is online. So they can't reach schema agreement, all of them. Why? No 
unusual

load visible in Ganglia plots.

ERROR [HintedHandoff:1] 2011-12-05 19:52:17,426 
AbstractCassandraDaemon.java (line 139) Fatal exception in thread 
Thread[Hint

edHandoff:1,1,main]
java.lang.RuntimeException: java.lang.RuntimeException: Could not reach 
schema agreement with /130.199.185.194 in 6ms
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)



Re: Repair failure under 0.8.6

2011-12-04 Thread Maxim Potekhin

I capped heap and the error is still there. So I keep seeing node dead
messages even when I know the nodes were OK. Where and how do I tweak
timeouts?


9d-cfc9-4cbc-9f1d-1467341388b8, endpoint /130.199.185.193 died
 INFO [GossipStage:1] 2011-12-04 00:26:16,362 Gossiper.java (line 683) 
InetAddress /130.199.185.193 is now UP
ERROR [AntiEntropySessions:1] 2011-12-04 00:26:16,518 
AbstractCassandraDaemon.java (line 139) Fatal exception in thread 
Thread[Anti\

EntropySessions:1,5,RMI Runtime]
java.lang.RuntimeException: java.io.IOException: Problem during repair 
session manual-repair-a6a655dc-63f0-4c1c-9c0b-0621f5692ba2, \

endpoint /130.199.185.194 died
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Problem during repair session 
manual-repair-a6a655dc-63f0-4c1c-9c0b-0621f5692ba2, endpoint /130.199\

.185.194 died
at 
org.apache.cassandra.service.AntiEntropyService$RepairSession.failedNode(AntiEntropyService.java:712)
at 
org.apache.cassandra.service.AntiEntropyService$RepairSession.convict(AntiEntropyService.java:749)
at 
org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:155)
at 
org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:527)

at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:57)
at 
org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:157)



On 12/3/2011 8:34 PM, Maxim Potekhin wrote:

Thank you Peter. Before I look into details as you suggest,
may I ask what you mean automatically restarted? They way
the box and Cassandra are set up in my case is such that the
death of either if final.

Also, how do I look for full GC? I just realized that in the latest
install, I might have omitted capping the heap size -- and the
nodes have 48GB each. I guess this could be a problem, precipitating
GC death, right?

Thank you

Maxim


On 12/3/2011 7:46 PM, Peter Schuller wrote:
quite understand how Cassandra declared a node dead (in the below). 
Was is a

timeout? How do I fix that?

I was about to respond to say that repair doesn't fail just due to
failure detection, but this appears to have been broken by
CASSANDRA-2433 :(

Unless there is a subtle bug the exception you're seeing should be
indicative that it really was considered Down by the node. You might
grep the log for references ot the node in question (UP or DOWN) to
confirm. The question is why though. I would check if the node has
maybe automatically restarted, or went into full GC, etc.





Re: Repair failure under 0.8.6

2011-12-04 Thread Maxim Potekhin

Thanks Peter!

I will try to increase phi_convict -- I will just need to restart the 
cluster after

the edit, right?

I do recall that I see nodes temporarily marked as down, only to pop up 
later.


In the current situation, there is no load on the cluster at all, 
outside the

maintenance like the repair.

How do I configure the print level for the GC report?

Thank you,
Maxim


On 12/4/2011 2:09 PM, Peter Schuller wrote:

I capped heap and the error is still there. So I keep seeing node dead
messages even when I know the nodes were OK. Where and how do I tweak
timeouts?

You can increase phi_convict_threshold in the configuration. However,
I would rather want to find out why they are being marked as down to
begin with. In a healthy situation, especially if you are not putting
extreme load on the cluster, there is very little reason for hosts to
be marked as down unless there's some bug somewhere.

Is this cluster under constant traffic? Are you seeing slow requests
from the point of view of the client (indicating that some requests
are routed to nodes that are temporarily inaccessible)?

With respect to GC, I would recommend running with -XX:+PrintGC and
-XX:PrintGCDetails and -XX:+PrintGCTimeStamps and
-XX:+PrintGCDateStamps and then look at the system log. A fallback to
full GC should be findable by grepping for Full.

Also, is this a problem with one specific host, or is it happening to
all hosts every now and then? And I mean either the host being flagged
as down, or the host that is flagging others as down.

As for uncapped heap: Generally a larger heap is not going to make it
more likely to fall back to full GC; usually the opposite is true.
However, a larger heap can make some of the non-full GC pauses longer,
depending. In either case, r unning with the above GC options will
give you specific information on GC pauses and should allow you to
rule that out (or not).





Re: Repair failure under 0.8.6

2011-12-04 Thread Maxim Potekhin

Please disregard the GC part of the question -- I found it.

On 12/4/2011 4:12 PM, Maxim Potekhin wrote:

Thanks Peter!

I will try to increase phi_convict -- I will just need to restart the 
cluster after

the edit, right?

I do recall that I see nodes temporarily marked as down, only to pop 
up later.


In the current situation, there is no load on the cluster at all, 
outside the

maintenance like the repair.

How do I configure the print level for the GC report?

Thank you,
Maxim


On 12/4/2011 2:09 PM, Peter Schuller wrote:
I capped heap and the error is still there. So I keep seeing node 
dead

messages even when I know the nodes were OK. Where and how do I tweak
timeouts?

You can increase phi_convict_threshold in the configuration. However,
I would rather want to find out why they are being marked as down to
begin with. In a healthy situation, especially if you are not putting
extreme load on the cluster, there is very little reason for hosts to
be marked as down unless there's some bug somewhere.

Is this cluster under constant traffic? Are you seeing slow requests
from the point of view of the client (indicating that some requests
are routed to nodes that are temporarily inaccessible)?

With respect to GC, I would recommend running with -XX:+PrintGC and
-XX:PrintGCDetails and -XX:+PrintGCTimeStamps and
-XX:+PrintGCDateStamps and then look at the system log. A fallback to
full GC should be findable by grepping for Full.

Also, is this a problem with one specific host, or is it happening to
all hosts every now and then? And I mean either the host being flagged
as down, or the host that is flagging others as down.

As for uncapped heap: Generally a larger heap is not going to make it
more likely to fall back to full GC; usually the opposite is true.
However, a larger heap can make some of the non-full GC pauses longer,
depending. In either case, r unning with the above GC options will
give you specific information on GC pauses and should allow you to
rule that out (or not).





Re: can not create a column family named 'index'

2011-12-04 Thread Maxim Potekhin
I seem to recall problems when using a cf called indexRegistry, don't 
remember

much detail now.

Maxim

On 11/30/2011 7:24 PM, Shu Zhang wrote:

Hi, just wondering if this is intentional:

[default@test] create column family index;
Syntax error at position 21: mismatched input 'index' expecting set null
[default@test] create column family idx;
b9aae960-1bb2-11e1--bf27a177f2f6
Waiting for schema agreement...
... schemas agree across the cluster

Thanks,
Shu




Re: Repair failure under 0.8.6

2011-12-04 Thread Maxim Potekhin

As a side effect of the failed repair (so it seems) the disk usage on the
affected node prevents compaction from working. It still works on
the remaining nodes (we have 3 total).
Is there a way to scrub the extraneous data?

Thanks

Maxim


On 12/4/2011 4:29 PM, Peter Schuller wrote:


I will try to increase phi_convict -- I will just need to restart the
cluster after
the edit, right?

You will need to restart the nodes for which you want the phi convict
threshold to be different. You might want to do on e.g. half of the
cluster to do A/B testing.


I do recall that I see nodes temporarily marked as down, only to pop up
later.

I recommend grepping through the logs on all the clusters (e.g., cat
/var/log/cassandra/cassandra.log | grep UP | wc -l). That should tell
you quickly whether they all seem to be seeing roughly as many node
flaps, or whether some particular node or set of nodes is/are
over-represented.

Next, look at the actual nodes flapping (remove wc -l) and see if all
nodes are flapping or if it is a single node, or a subset of the nodes
(e.g., sharing a switch perhaps).


In the current situation, there is no load on the cluster at all, outside
the
maintenance like the repair.

Ok. So what i'm getting at then is that there may be real legitimate
connectivity problems that you aren't noticing in any other way since
you don't have active traffic to the cluster.






Repair failure under 0.8.6

2011-12-03 Thread Maxim Potekhin
Please help -- I've been having pretty consistent failures that look 
like this one. Don't know how to proceed.
Below text comes from the system log. The cluster was all up before and 
after the attempted repair, so I don't
quite understand how Cassandra declared a node dead (in the below). Was 
is a timeout? How do I fix that?



Thanks,
Maxim

 INFO [GossipStage:1] 2011-12-02 17:12:07,293 Gossiper.java (line 683) 
InetAddress /130.199.185.194 is now UP
ERROR [AntiEntropySessions:1] 2011-12-02 17:12:07,354 
AbstractCassandraDaemon.java (line 139) Fatal exception in thread 
Thread[AntiEntropySessions:1,5,RMI Runtime]
java.lang.RuntimeException: java.io.IOException: Problem during repair 
session manual-repair-618fad49-387f-44df-a25e-aa57b314768a, endpoint 
/130.199.185.194 died
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Problem during repair session 
manual-repair-618fad49-387f-44df-a25e-aa57b314768a, endpoint 
/130.199.185.194 died
at 
org.apache.cassandra.service.AntiEntropyService$RepairSession.failedNode(AntiEntropyService.java:712)
at 
org.apache.cassandra.service.AntiEntropyService$RepairSession.convict(AntiEntropyService.java:749)
at 
org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:155)
at 
org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:527)

at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:57)
at 
org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:157)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)

at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)

... 3 more
 INFO [AntiEntropyStage:1] 2011-12-02 17:12:07,392 
AntiEntropyService.java (line 215) Sending AEService tree for 
#TreeRequest manual-repair-c721c217-4b70-4a15-91fc-374b39b8b05\
3, cassandra03.usatlas.bnl.gov/130.199.185.195, (PANDA,files), 
(56713727820156410577229101238628035242,113427455640312821154458202477256070484]





Re: Repair failure under 0.8.6

2011-12-03 Thread Maxim Potekhin

Thank you Peter. Before I look into details as you suggest,
may I ask what you mean automatically restarted? They way
the box and Cassandra are set up in my case is such that the
death of either if final.

Also, how do I look for full GC? I just realized that in the latest
install, I might have omitted capping the heap size -- and the
nodes have 48GB each. I guess this could be a problem, precipitating
GC death, right?

Thank you

Maxim


On 12/3/2011 7:46 PM, Peter Schuller wrote:

quite understand how Cassandra declared a node dead (in the below). Was is a
timeout? How do I fix that?

I was about to respond to say that repair doesn't fail just due to
failure detection, but this appears to have been broken by
CASSANDRA-2433 :(

Unless there is a subtle bug the exception you're seeing should be
indicative that it really was considered Down by the node. You might
grep the log for references ot the node in question (UP or DOWN) to
confirm. The question is why though. I would check if the node has
maybe automatically restarted, or went into full GC, etc.





How many indexes to keep? Guidelines

2011-11-29 Thread Maxim Potekhin
As a matter of practice, how many secondary indexes on a CF do you 
usually keep?

What are rules of thumb? Is 10 too many? 100? 1000?

Thanks

Maxim



Re: Yanking a dead node

2011-11-29 Thread Maxim Potekhin

Thanks! Looks pretty obvious in retrospect...

Regards,

Maxim


On 11/24/2011 6:54 AM, Filipe Gonçalves wrote:

Just remove its token from the ring using

nodetool removetokentoken

2011/11/23 Maxim Potekhinpotek...@bnl.gov:

This was discussed a long time ago, but I need to know what's the state of
the art answer to that:
assume one of my few nodes is very dead. I have no resources or time to fix
it. Data is replicated
so the data is still available in the cluster. How do I completely remove
the dead node without having
to rebuild it, repair, drain and decommission?

TIA
Maxim









Yanking a dead node

2011-11-23 Thread Maxim Potekhin
This was discussed a long time ago, but I need to know what's the state 
of the art answer to that:
assume one of my few nodes is very dead. I have no resources or time to 
fix it. Data is replicated
so the data is still available in the cluster. How do I completely 
remove the dead node without having

to rebuild it, repair, drain and decommission?

TIA
Maxim



7199

2011-11-22 Thread Maxim Potekhin

Hello,

I have this in my cassandra-env.sh

JMX_PORT=7199

Does this mean that if I use nodetool from another node, it will try to 
connect to that

particular port?

Thanks,

Maxim



Re: 7199

2011-11-22 Thread Maxim Potekhin

Thanks. I'm trying to look up HttpAdaptor and what it does,
can you give any pointers? Thanks. I didn't find much useful
info just yet.

Maxim


On 11/22/2011 9:52 PM, Jeremiah Jordan wrote:

Yes, that is the port nodetool needs to access.


On Nov 22, 2011, at 8:43 PM, Maxim Potekhin wrote:


Hello,

I have this in my cassandra-env.sh

JMX_PORT=7199

Does this mean that if I use nodetool from another node, it will try to connect 
to that
particular port?

Thanks,

Maxim





Re: read performance problem

2011-11-19 Thread Maxim Potekhin

Try to see if there is a lot of paging going on,
and run some benchmarks on the disk itself.
Are you running Windows or Linux? Do you think
the disk may be fragmented?


Maxim


On 11/19/2011 8:58 PM, Kent Tong wrote:

Hi,

On my computer with 2G RAM and a core 2 duo CPU E4600 @ 2.40GHz, I am 
testing the
performance of Cassandra. The write performance is good: It can write 
a million records
in 10 minutes. However, the query performance is poor and it takes 10 
minutes to read
10K records with sequential keys from 0 to  (about 100 QPS). 
This is far away from

the 3,xxx QPS found on the net.

Cassandra decided to use 1G as the Java heap size which seems to be 
fine as at the end

of the benchmark the swap was barely used (only 1M used).

I understand that my computer may be not as powerful as those used in 
the other benchmarks,

but it shouldn't be that far off (1:30), right?

Any suggestion? Thanks in advance!





A Cassandra CLI question: null vs 0 rows

2011-11-17 Thread Maxim Potekhin

Hello everyone,

I run a query on a secondary index. For some queries, I get 0 rows 
returned. In other cases,

I just get a string that reads null.

What's going on?

TIA

Maxim



Re: A Cassandra CLI question: null vs 0 rows

2011-11-17 Thread Maxim Potekhin
Thanks Jonathan. I get the bellow error. Don't have a clue as to what it 
means.



null
java.lang.RuntimeException
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:310)
at 
org.apache.cassandra.cli.CliMain.processStatement(CliMain.java:217)

at org.apache.cassandra.cli.CliMain.main(CliMain.java:345)
Caused by: java.lang.RuntimeException
at 
org.apache.cassandra.cli.CliClient.executeGetWithConditions(CliClient.java:814)
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:208)

... 2 more


On 11/17/2011 12:28 PM, Jonathan Ellis wrote:

If CLI returns null it means there was an error -- run with --debug to
check the exception.

On Thu, Nov 17, 2011 at 11:20 AM, Maxim Potekhinpotek...@bnl.gov  wrote:

Hello everyone,

I run a query on a secondary index. For some queries, I get 0 rows returned.
In other cases,
I just get a string that reads null.

What's going on?

TIA

Maxim









What sort of load do the tombstones create on the cluster?

2011-11-17 Thread Maxim Potekhin
In view of my unpleasant discovery last week that deletions in Cassandra 
lead to a very real

and serious performance loss, I'm working on a strategy of moving forward.

If the tombstones do cause such problem, where should I be looking for 
performance bottlenecks?
Is it disk, CPU or something else? Thing is, I don't see anything 
outstanding in my Ganglia plots.


TIA,

Maxim



Varying number of rows coming from same query on same database

2011-11-17 Thread Maxim Potekhin

Hello,

I'm running the same query repeatedly. It's a secondary index query,
done from a Pycassa client. I see that when I iterate the result object,
I get slightly different number of entries when running the test serially.
There is no deletions in the database, and no writes, it's static for now.

Any comments will be appreciated.

Maxim



Re: Data Model Design for Login Servie

2011-11-17 Thread Maxim Potekhin

1122: {
  gender: MALE
  birthdate: 1987.11.09
  name: Alfred Tester
  pwd: e72c504dc16c8fcd2fe8c74bb492affa
  alias1: alfred.tes...@xyz.de mailto:alfred.tes...@xyz.de
  alias2: alf...@aad.de mailto:alf...@aad.de
  alias3: a...@dd.de mailto:a...@dd.de
 }

...and you can use secondary indexes to query on anything.

Maxim


On 11/17/2011 4:08 PM, Maciej Miklas wrote:

Hallo all,

I need your help to design structure for simple login service. It 
contains about 100.000.000 customers and each one can have about 10 
different logins - this results 1.000.000.000 different logins.


Each customer contains following data:
- one to many login names as string, max 20 UTF-8 characters long
- ID as long - one customer has only one ID
- gender
- birth date
- name
- password as MD5

Login process needs to find user by login name.
Data in Cassandra is replicated - this is necessary to obtain all 
required login data in single call. Also usually we expect low write 
traffic and heavy read traffic - round trips for reading data should 
be avoided.
Below I've described two possible cassandra data models based on 
example: we have two users, first user has two logins and second user 
has three logins


A) Skinny rows
 - row key contains login name - this is the main search criteria
 - login data is replicated - each possible login is stored as single 
row which contains all user data - 10 logins for single customer 
create 10 rows, where each row has different key and the same content


// first 3 rows has different key and the same replicated data
alfred.tes...@xyz.de mailto:alfred.tes...@xyz.de {
  id: 1122
  gender: MALE
  birthdate: 1987.11.09
  name: Alfred Tester
  pwd: e72c504dc16c8fcd2fe8c74bb492affa
},
alf...@aad.de mailto:alf...@aad.de {
  id: 1122
  gender: MALE
  birthdate: 1987.11.09
  name: Alfred Tester
  pwd: e72c504dc16c8fcd2fe8c74bb492affa
},
a...@dd.de mailto:a...@dd.de {
  id: 1122
  gender: MALE
  birthdate: 1987.11.09
  name: Alfred Tester
  pwd: e72c504dc16c8fcd2fe8c74bb492affa
},

// two following rows has again the same data for second customer
manf...@xyz.de mailto:manf...@xyz.de {
  id: 1133
  gender: MALE
  birthdate: 1997.02.01
  name: Manfredus Maximus
  pwd: e44c504ff16c8fcd2fe8c74bb492adda
},
rober...@xyz.de mailto:rober...@xyz.de {
  id: 1133
  gender: MALE
  birthdate: 1997.02.01
  name: Manfredus Maximus
  pwd: e44c504ff16c8fcd2fe8c74bb492adda
}

B) Rows grouped by alphabetical prefix
- Number of rows is limited - for example first letter from login name
- Each row contains all logins which benign with row key - row with 
key 'a' contains all logins which begin with 'a'
- Data might be unbalanced, but we avoid skinny rows - this might have 
positive performance impact (??)
- to avoid super columns each row contains directly columns, where 
column name is the user login and column value is corresponding data 
in kind of serialized form (I would like to have is human readable)


a {
alfred.tes...@xyz.de mailto:alfred.tes...@xyz.de:1122;MALE;1987.11.09;
 Alfred 
Tester;e72c504dc16c8fcd2fe8c74bb492affa,


alf...@aad.de@xyz.de http://xyz.de:1122;MALE;1987.11.09;
 Alfred 
Tester;e72c504dc16c8fcd2fe8c74bb492affa,


a...@dd.de@xyz.de http://xyz.de:1122;MALE;1987.11.09;
 Alfred 
Tester;e72c504dc16c8fcd2fe8c74bb492affa

  },

m {
manf...@xyz.de mailto:manf...@xyz.de:1133;MALE;1997.02.01;
  Manfredus Maximus;e44c504ff16c8fcd2fe8c74bb492adda
  },

r {
rober...@xyz.de mailto:rober...@xyz.de:1133;MALE;1997.02.01;
  Manfredus Maximus;e44c504ff16c8fcd2fe8c74bb492adda

  }

Which solution is better, especially for better read performance? Do 
you have better idea?


Thanks,
Maciej




Re: A Cassandra CLI question: null vs 0 rows

2011-11-17 Thread Maxim Potekhin
Should I file a ticket? I consistently see this behavior after a mass 
delete.


On 11/17/2011 12:46 PM, Maxim Potekhin wrote:
Thanks Jonathan. I get the bellow error. Don't have a clue as to what 
it means.



null
java.lang.RuntimeException
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:310)
at 
org.apache.cassandra.cli.CliMain.processStatement(CliMain.java:217)

at org.apache.cassandra.cli.CliMain.main(CliMain.java:345)
Caused by: java.lang.RuntimeException
at 
org.apache.cassandra.cli.CliClient.executeGetWithConditions(CliClient.java:814)
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:208)

... 2 more


On 11/17/2011 12:28 PM, Jonathan Ellis wrote:

If CLI returns null it means there was an error -- run with --debug to
check the exception.

On Thu, Nov 17, 2011 at 11:20 AM, Maxim Potekhinpotek...@bnl.gov  
wrote:

Hello everyone,

I run a query on a secondary index. For some queries, I get 0 rows 
returned.

In other cases,
I just get a string that reads null.

What's going on?

TIA

Maxim









Re: Mass deletion -- slowing down

2011-11-14 Thread Maxim Potekhin
Thanks for the note. Ideally I would not like to keep track of what is 
the oldest indexed date,
because this means that I'm already creating a bit of infrastructure on 
top of my database,

with attendant referential integrity problems.

But I suppose I'll be forced to do that. In addition, I'll have to wait 
until the grace period is over and compact,
removing the tombstones and finally clearing the disk (which is what I 
need to do in the first place).


Frankly, this whole situation for me illustrates a very real deficiency 
in Cassandra -- one would think that
deleting less than one percent of data shouldn't really lead to complete 
failures in certain indexed queries.

That's bad.

Maxim



On 11/14/2011 3:01 AM, Guy Incognito wrote:
i think what he means is...do you know what day the 'oldest' day is?  
eg if you have a rolling window of say 2 weeks, structure your query 
so that your slice range only goes back 2 weeks, rather than to the 
beginning of time.  this would avoid iterating over all the tombstones 
from prior to the 2 week window.  this wouldn't work if you are 
deleting arbitrary days in the middle of your date range.


On 14/11/2011 02:02, Maxim Potekhin wrote:

Thanks Peter,

I'm not sure I entirely follow. By the oldest data, do you mean the
primary key corresponding to the limit of the time horizon? 
Unfortunately,
unique IDs and the timstamps do not correlate in the sense that 
chronologically
newer entries might have a smaller sequential ID. That's because 
the timestamp
corresponds to the last update that's stochastic in the sense that 
the jobs can take
from seconds to days to complete. As I said I'm not sure I understood 
you

correctly.

Also, I note that queries on different dates (i.e. not contaminated 
with lots

of tombstones) work just fine, which is consistent with the picture that
emerged so far.

Theoretically -- would compaction or cleanup help?

Thanks

Maxim




On 11/13/2011 8:39 PM, Peter Schuller wrote:
I do limit the number of rows I'm asking for in Pycassa. Queries on 
primary

keys still work fine,

Is it feasable in your situation to keep track of the oldest possible
data (for example, if there is a single sequential writer that rotates
old entries away it could keep a record of what the oldest might be)
so that you can bound your index lookup= that value (and avoid the
tombstones)?







Re: Mass deletion -- slowing down

2011-11-13 Thread Maxim Potekhin
I've done more experimentation and the behavior persists: I start with a 
normal dataset which is searcheable by a secondary index. I select by 
that index the entries that match a certain criterion, then delete 
those. I tried two methods of deletion -- individual cf.remove() as well 
as batch removal in Pycassa.
What happens after that is as follows: attempts to read the same CF, 
using the same index values start to time out in the Pycassa client 
(there is a thrift message about timeout). The entries not touched by 
such attempted deletion are read just fine still.


Has anyone seen such behavior?

Thanks,
Maxim

On 11/10/2011 8:30 PM, Maxim Potekhin wrote:

Hello,

My data load comes in batches representing one day in the life of a 
large computing facility.
I index the data by the day it was produced, to be able to quickly 
pull data for a specific day

within the last year or two. There are 6 other indexes.

When it comes to retiring the data, I intend to delete it for the 
oldest date and after that add
a fresh batch of data, so I control the disk space. Therein lies a 
problem -- and it maybe
Pycassa related, so I also filed an issue on github -- then I select 
by 'DATE=blah' and then
do a batch remove, it works fine for a while, and then after a few 
thousand deletions (done
in batches of 1000) it grinds to a halt, i.e. I can no longer iterate 
the result, which manifests

in a timeout error.

Is that a behavior seen before? Cassandra version is 0.8.6, Pycassa 
1.3.0.


TIA,

Maxim




Re: Mass deletion -- slowing down

2011-11-13 Thread Maxim Potekhin

Thanks to all for valuable insight!

Two comments:
a) this is not actually time series data, but yes, each item has
a timestamp and thus chronological attribution.

b) so, what do you practically recommend? I need to delete
half a million to a million entries daily, then insert fresh data.
What's the right operation procedure?

For some reason I can still select on the index in the CLI, it's
the Pycassa module that gives me trouble, but I need it as this
is my platform and we are a Python shop.

Maxim



On 11/13/2011 7:22 PM, Peter Schuller wrote:

Deletions in Cassandra imply the use of tombstones (see
http://wiki.apache.org/cassandra/DistributedDeletes) and under some
circumstances reads can turn O(n) with respect to the amount of
columns deleted, depending. It sounds like this is what you're seeing.

For example, suppose you're inserting a range of columns into a row,
deleting it, and inserting another non-overlapping subsequent range.
Repeat that a bunch of times. In terms of what's stored in Cassandra
for the row you now have:

   tomb
   tomb
   tomb
   tomb
   
actual data

If you then do something like a slice on that row with the end-points
being such that they include all the tombstones, Cassandra essentially
has to read through and process all those tombstones (for the
PostgreSQL aware: this is similar to the effect you can get if
implementing e.g. a FIFO queue, where MIN(pos) turns O(n) with respect
to the number of deleted entries until the last vacuum - improved in
modern versions)).






Re: Mass deletion -- slowing down

2011-11-13 Thread Maxim Potekhin

Brandon,

thanks for the note.

Each row represents a computational task (a job) executed on the grid or 
in the cloud. It naturally has a timestamp as one of its attributes, 
representing the time of the last update. This timestamp
is used to group the data into buckets each representing one day in 
the system's activity.
I create the DATE attribute and add it to each row, e.g. it's a column 
{'DATE','2013'}.

I create an index on that column, along with a few others.

Now, I want to rotate the data out of my database, on daily basis. For 
that, I need to

select on 'DATE' and then do a delete.

I do limit the number of rows I'm asking for in Pycassa. Queries on 
primary keys still work fine,
it's just the indexed queries that start to time out. I changed timeouts 
and number of retries

in the Pycassa pool, but that doesn't seem to help.

Thanks,
Maxim

On 11/13/2011 8:00 PM, Brandon Williams wrote:

On Sun, Nov 13, 2011 at 6:55 PM, Maxim Potekhinpotek...@bnl.gov  wrote:

Thanks to all for valuable insight!

Two comments:
a) this is not actually time series data, but yes, each item has
a timestamp and thus chronological attribution.

b) so, what do you practically recommend? I need to delete
half a million to a million entries daily, then insert fresh data.
What's the right operation procedure?

I'd have to know more about what your access pattern is like to give
you a fully informed answer.


For some reason I can still select on the index in the CLI, it's
the Pycassa module that gives me trouble, but I need it as this
is my platform and we are a Python shop.

This seems odd, since the rpc_timeout is the same for all clients.
Maybe pycassa is asking for more data than the cli?

-Brandon




Re: Mass deletion -- slowing down

2011-11-13 Thread Maxim Potekhin

Brandon,

it won't work in my application, as I need a few indexes on attributes
of the job. In addition, a large portion of queries is based on key-value
lookup, and that key is the unique job ID. I really can't have data packed
in one row per day.


Thanks,
Maxim

On 11/13/2011 8:34 PM, Brandon Williams wrote:

On Sun, Nov 13, 2011 at 7:25 PM, Maxim Potekhinpotek...@bnl.gov  wrote:

Each row represents a computational task (a job) executed on the grid or in
the cloud. It naturally has a timestamp as one of its attributes,
representing the time of the last update. This timestamp
is used to group the data into buckets each representing one day in the
system's activity.
I create the DATE attribute and add it to each row, e.g. it's a column
{'DATE','2013'}.

Hmm, so why is pushing this into the row key and then deleting the
entire row not acceptable? (this is what the link I gave would
prescribe)  In other words, you bucket at the row level, instead of
relying on a column attribute that needs an index.

-Brandon




Re: Mass deletion -- slowing down

2011-11-13 Thread Maxim Potekhin

Thanks Peter,

I'm not sure I entirely follow. By the oldest data, do you mean the
primary key corresponding to the limit of the time horizon? Unfortunately,
unique IDs and the timstamps do not correlate in the sense that 
chronologically
newer entries might have a smaller sequential ID. That's because the 
timestamp
corresponds to the last update that's stochastic in the sense that the 
jobs can take

from seconds to days to complete. As I said I'm not sure I understood you
correctly.

Also, I note that queries on different dates (i.e. not contaminated 
with lots

of tombstones) work just fine, which is consistent with the picture that
emerged so far.

Theoretically -- would compaction or cleanup help?

Thanks

Maxim




On 11/13/2011 8:39 PM, Peter Schuller wrote:

I do limit the number of rows I'm asking for in Pycassa. Queries on primary
keys still work fine,

Is it feasable in your situation to keep track of the oldest possible
data (for example, if there is a single sequential writer that rotates
old entries away it could keep a record of what the oldest might be)
so that you can bound your index lookup= that value (and avoid the
tombstones)?





Mass deletion -- slowing down

2011-11-10 Thread Maxim Potekhin

Hello,

My data load comes in batches representing one day in the life of a 
large computing facility.
I index the data by the day it was produced, to be able to quickly pull 
data for a specific day

within the last year or two. There are 6 other indexes.

When it comes to retiring the data, I intend to delete it for the oldest 
date and after that add
a fresh batch of data, so I control the disk space. Therein lies a 
problem -- and it maybe
Pycassa related, so I also filed an issue on github -- then I select by 
'DATE=blah' and then
do a batch remove, it works fine for a while, and then after a few 
thousand deletions (done
in batches of 1000) it grinds to a halt, i.e. I can no longer iterate 
the result, which manifests

in a timeout error.

Is that a behavior seen before? Cassandra version is 0.8.6, Pycassa 1.3.0.

TIA,

Maxim



Is there a way to get only keys with get_indexed_slices?

2011-11-10 Thread Maxim Potekhin


Is there a way to get only keys with get_indexed_slices?
Looking at the code, it's not possible, but -- is there some way anyhow?
I don't want to extract any data, just a list of matching keys.

TIA,

Maxim



Error connection to remote JMX agent during repair

2011-11-07 Thread Maxim Potekhin

Hello,

I'm trying to run repair on one of my nodes which needs to be 
repopulated after
a failure of the hard drive. What I'm getting is below. Note: I'm not 
loading JMX

with Cassandra, it always worked before... The version if 0.8.6.

Any help will be appreciated,

Maxim


Error connection to remote JMX agent!
java.io.IOException: Failed to retrieve RMIServer stub: 
javax.naming.CommunicationException [Root exception is 
java.rmi.ConnectIOException: error during JRMP connection establishment; 
nested exception is:

java.net.SocketTimeoutException: Read timed out]
at 
javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:338)
at 
javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:248)

at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:140)
at org.apache.cassandra.tools.NodeProbe.init(NodeProbe.java:110)
at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:582)
Caused by: javax.naming.CommunicationException [Root exception is 
java.rmi.ConnectIOException: error during JRMP connection establishment; 
nested exception is:

java.net.SocketTimeoutException: Read timed out]
at 
com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:101)
at 
com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:185)

at javax.naming.InitialContext.lookup(InitialContext.java:392)
at 
javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1886)
at 
javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1856)
at 
javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:257)

... 4 more
Caused by: java.rmi.ConnectIOException: error during JRMP connection 
establishment; nested exception is:

java.net.SocketTimeoutException: Read timed out
at 
sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:286)
at 
sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184)

at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322)
at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)
at 
com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:97)

... 9 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readByte(DataInputStream.java:248)
at 
sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:228)




Re: Tool for SQL - Cassandra data movement

2011-11-01 Thread Maxim Potekhin
Just a short comment -- we are going the CSV way as well because of its 
compactness and extreme portability.
The CSV files are kept in the cloud as backup. They can also find other 
uses. JSON would work as well, but

it would be at least twice as large in size.

Maxim

On 9/22/2011 1:25 PM, Nehal Mehta wrote:
We are trying to carry out same stuff, but instead of migrating into 
JSON, we are exporting into CSV and than importing CSV into 
Cassandra.  Which DB are you currently using?


Thanks,
Nehal Mehta.

2011/9/22 Radim Kolar h...@sendmail.cz mailto:h...@sendmail.cz

I need tool which is able to dump tables via JDBC into JSON format
for cassandra import. I am pretty sure that somebody already wrote
that.

Are there tools which can do direct JDBC - cassandra import?






Re: CMS GC initial-mark taking 6 seconds , bad?

2011-10-20 Thread Maxim Potekhin

Hello Aaron,

I happen to have 48GB on each machines I use in the cluster. Can I 
assume that I can't really use all of this memory productively? Do you 
have any suggestion related to that? Can I run more than one instance on 
Cassandra on the same box (using different ports) to take advantage of 
this memory, assuming the disk has enough bandwidth?


Thanks,
Maxim

On 9/25/2011 11:37 AM, aaron morton wrote:

It does seem long and will be felt by your application.

Are you running a 47GB heap ? Most peeps seem to think 8 to 12 is about the 
viable maximum.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 25/09/2011, at 7:14 PM, Yang wrote:


I see the following in my GC log

1910.513: [GC [1 CMS-initial-mark: 2598619K(26214400K)]
13749939K(49807360K), 6.0696680 secs] [Times: user=6.10 sys=0.00,
real=6.07 secs]

so there is a stop-the-world period of 6 seconds. does this sound bad
? or 6 seconds is OK  and we should expect the built-in
fault-tolerance of Cassandra handle this?

Thanks
Yang




Re: hw requirements

2011-09-01 Thread Maxim Potekhin
Sorry about unclear naming scheme. I meant that if I want to index on a 
few columns simultaneously,

I create a new column with catenated values of these.

On 8/31/2011 3:10 PM, Anthony Ikeda wrote:
Sorry to fork this topic, but in composite indexes do you mean as 
strings or as Composite(). I only ask cause we have started using 
the Composite as rowkeys and column names to replace the use of 
concatenated strings mainly for lookup purposes.


Anthony


On Wed, Aug 31, 2011 at 10:27 AM, Maxim Potekhin potek...@bnl.gov 
mailto:potek...@bnl.gov wrote:


Plenty of comments in this thread already, and I agree with those
saying
it depends. From my experience, a cluster with 18 spindles total
could not match the performance and throughput of our primary
Oracle server which had 108 spindles. After we upgraded to SSD,
things have definitely changed for the better, for Cassandra.

Another thing is that if you plan to implement composite indexes by
catenating column values into additional columns, that would
constitute
a write hence you'll need CPU. So watch out.



On 8/29/2011 9:15 AM, Helder Oliveira wrote:

Hello guys,

What is the type of profile of a cassandra server.
Are SSD an option ?
Does cassandra needs better CPU ou lots of memory ?
Are SATA II disks ok ?

I am making some tests, and i started evaluating the possible
hardware.

If someone already has conclusions about it, please share :D

Thanks a lot.







Re: hw requirements

2011-08-31 Thread Maxim Potekhin

Plenty of comments in this thread already, and I agree with those saying
it depends. From my experience, a cluster with 18 spindles total
could not match the performance and throughput of our primary
Oracle server which had 108 spindles. After we upgraded to SSD,
things have definitely changed for the better, for Cassandra.

Another thing is that if you plan to implement composite indexes by
catenating column values into additional columns, that would constitute
a write hence you'll need CPU. So watch out.


On 8/29/2011 9:15 AM, Helder Oliveira wrote:

Hello guys,

What is the type of profile of a cassandra server.
Are SSD an option ?
Does cassandra needs better CPU ou lots of memory ?
Are SATA II disks ok ?

I am making some tests, and i started evaluating the possible hardware.

If someone already has conclusions about it, please share :D

Thanks a lot.




Re: Repair taking a long, long time

2011-07-20 Thread Maxim Potekhin
I can re-load all data that I have in the cluster, from a flat-file 
cache I have

on NFS, many times faster than the nodetool repair takes. And that's not
even accurate because as other noted nodetool repair eats up disk space
for breakfast and takes more than 24hrs on 200GB data load, at which point
I have to cancel. That's not acceptable. I simply don't know what to do now.


On 7/20/2011 8:47 AM, David Boxenhorn wrote:

I have this problem too, and I don't understand why.

I can repair my nodes very quickly by looping though all my data (when 
you read your data it does read-repair), but nodetool repair takes 
forever. I understand that nodetool repair builds merkle trees, etc. 
etc., so it's a different algorithm, but why can't nodetool repair be 
smart enough to choose the best algorithm? Also, I don't understand 
what's special about my data that makes nodetool repair so much slower 
than looping through all my data.



On Wed, Jul 20, 2011 at 12:18 AM, Maxim Potekhin potek...@bnl.gov 
mailto:potek...@bnl.gov wrote:


Thanks Edward. I'm told by our IT that the switch connecting the
nodes is pretty fast.
Seriously, in my house I copy complete DVD images from my bedroom to
the living room downstairs via WiFi, and a dozen of GB does not
seem like a
problem, on dirt cheap hardware (Patriot Box Office).

I also have just _one_ column major family but caveat emptor -- 8
indexes attached to
it (and there will be more). There is one accounting CF which is
small, can't possibly
make a difference.

By contrast, compaction (as in nodetool) performs quite well on
this cluster. I start suspecting some
sort of malfunction.

Looked at the system log during the repair, there is some
compaction agent doing
work that I'm not sure makes sense (and I didn't call for it).
Disk utilization all of a sudden goes up to 40%
per Ganglia, and stays there, this is pretty silly considering the
cluster is IDLE and we have SSDs. No external writes,
no reads. There are occasional GC stoppages, but these I can live
with.

This repair debacle happens 2nd time in a row. Cr@p. I need to go
to production soon
and that doesn't look good at all. If I can't manage a system that
simple (and/or get help
on this list) I may have to cut losses i.e. stay with Oracle.

Regards,

Maxim




On 7/19/2011 12:16 PM, Edward Capriolo wrote:


Well most SSD's are pretty fast. There is one more to
consider. If Cassandra determines nodes are out of sync it has
to transfer data across the network. If that is the case you
have to look at 'nodetool streams' and determine how much data
is being transferred between nodes. There are some open
tickets where with larger tables repair is streaming more then
it needs to. But even if the transfers are only 10% of your
200GB. Transferring 20 GB is not trivial.

If you have multiple keyspaces and column families repair one
at a time might make the process more manageable.







Repair taking a long, long time

2011-07-19 Thread Maxim Potekhin
We have something of the order of 200GB load on each of 3 machines in a 
balanced cluster under 0.8.1.
I started repair about 24hrs ago and did some moderate amount of inserts 
since then (a small fraction of

data load). The repair still appears to be running. What could go wrong?

Thanks,
Maxim




  1   2   >