Re: Index interval tuning

2011-05-11 Thread Héctor Izquierdo Seliva
El mié, 11-05-2011 a las 14:24 +1200, aaron morton escribió: What version and what were the values for RecentBloomFilterFalsePositives and BloomFilterFalsePositives ? The bloom filter metrics are updated in SSTableReader.getPosition() the only slightly odd thing I can see is that we do not

RE: Finding big rows

2011-05-11 Thread Meler Wojciech
Thanks for reply. My app uses 7-bit ascii string row keys so I assume that they could be directly used. I'd like to fetch whole row. I was able to dump the big row with sstable2json, but both my app and cli is unable to read the row from cassandra. I see in json dump that all columns are marked

Re: compaction strategy

2011-05-11 Thread Terje Marthinussen
Not sure I follow you. 4 sstables is the minimum compaction look for (by default). If there is 30 sstables of ~20MB sitting there because compaction is behind, you will compact those 30 sstables together (unless there is not enough space for that and considering you haven't changed the

Re: column bloat

2011-05-11 Thread Terje Marthinussen
On Wed, May 11, 2011 at 8:06 AM, aaron morton aa...@thelastpickle.comwrote: For a reasonable large amount of use cases (for me, 2 out of 3 at the moment) supercolumns will be units of data where the columns (attributes) will never change by themselves or where the data does not change anyway

Re: Read time get worse during dynamic snitch reset

2011-05-11 Thread shimi
I finally found some time to get back to this issue. I turned on the DEBUG log on the StorageProxy and it shows that all of these request are read from the other datacenter. Shimi On Tue, Apr 12, 2011 at 2:31 PM, aaron morton aa...@thelastpickle.comwrote: Something feels odd. From Peters

Re: Index interval tuning

2011-05-11 Thread aaron morton
What are the values for RecentBloomFilterFalsePositives and BloomFilterFalsePositives the non ratio ones ? - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 11 May 2011, at 19:53, Héctor Izquierdo Seliva wrote: El mié, 11-05-2011 a las

Re: Finding big rows

2011-05-11 Thread aaron morton
Couple of questions to ask. You may also get some value from the #cassandra chat room where you can have a bit more of a conversation. - checking you ran nodetool scrub when upgrading to 0.7.3 ? (not related to the current problem, just asking) - what client library was using to write the

Data types for cross language access

2011-05-11 Thread Oliver Dungey
I am currently working on a system with Cassandra that is written purely in Java. I know our end solution will require other languages to access the data in Cassandra (Python, C++ etc.). What is the best way to store data to ensure I can do this? Should I serialize everything to strings/json/xml

Re: Index interval tuning

2011-05-11 Thread Héctor Izquierdo Seliva
Sorry aaron, here are the values you requested RecentBloomFilterFalsePositives = 5; BloomFilterFalsePositives = 385260; uptime of the node is three days and a half, more or less El mié, 11-05-2011 a las 22:05 +1200, aaron morton escribió: What are the values for

How to invoke getNaturalEndpoints with jconsole?

2011-05-11 Thread Maki Watanabe
Hello, It's a question on jconsole rather than cassandra, how can I invoke getNaturalEndpoints with jconsole? org.apache.cassandra.service.StorageService.Operations.getNaturalEndpoints I want to run this method to find nodes which are responsible to store data for specific row key. I can find

Re: EC2 Snitch

2011-05-11 Thread Vijay
We are using this patch in our multi-region testing... yes this approach is going to be integrated into https://issues.apache.org/jira/browse/CASSANDRA-2491 once it is committed (you might want to wait for that). Yes this fix the Amazon infrastructure problems and it will automatically detect the

RE: Finding big rows

2011-05-11 Thread Meler Wojciech
I didn't run nodetool scrub. My app uses c++ thrift client (0.5.0 and 0.6.1) . As this is production environment I get a lot of messages collecting %s of %s, but there is no row key. I've matched it by uuid and thread - hope it is ok:

Re: Index interval tuning

2011-05-11 Thread Chris Burroughs
On 05/10/2011 10:24 PM, aaron morton wrote: What version and what were the values for RecentBloomFilterFalsePositives and BloomFilterFalsePositives ? The bloom filter metrics are updated in SSTableReader.getPosition() the only slightly odd thing I can see is that we do not count a key

Re: How to invoke getNaturalEndpoints with jconsole?

2011-05-11 Thread Nick Bailey
As far as I know you can not call getNaturalEndpoints from jconsole because it takes a byte array as a parameter and jconsole doesn't provide a way for inputting a byte array. You might be able to use the thrift call 'describe_ring' to do what you want though. You will have to manually hash your

Re: How to invoke getNaturalEndpoints with jconsole?

2011-05-11 Thread Maki Watanabe
Thanks, So my options are: 1. Write a thrift client code to call describe_ring with hashed key or 2. Write a JMX client code to call getNaturalEndpoints right? 2011/5/11 Nick Bailey n...@datastax.com: As far as I know you can not call getNaturalEndpoints from jconsole because it takes a byte

Re: How to invoke getNaturalEndpoints with jconsole?

2011-05-11 Thread Nick Bailey
Yes. On Wed, May 11, 2011 at 8:25 AM, Maki Watanabe watanabe.m...@gmail.com wrote: Thanks, So my options are: 1. Write a thrift client code to call describe_ring with hashed key or 2. Write a JMX client code to call getNaturalEndpoints right? 2011/5/11 Nick Bailey n...@datastax.com: As

Re: compaction strategy

2011-05-11 Thread Jonathan Ellis
You are of course free to reduce the min per bucket to 2. The fundamental idea of sstables + compaction is to trade disk space for higher write performance. For most applications this is the right trade to make on modern hardware... I don't think you'll get very far trying to get the 2nd without

Re: Finding big rows

2011-05-11 Thread Peter Schuller
What is the best way to find keys of such big rows? One, if not necessarily the best, way is to check system.log for large row warnings that trigger for rows large enough to be compacted lazily. Grep for 'azy' (or lazy case-insens) and you should find it. -- / Peter Schuller

Online text search with Hadoop/Brisk

2011-05-11 Thread Ben Scholl
I keep reading that Hadoop/Brisk is not suitable for online querying, only for offline/batch processing. What exactly are the reasons it is unsuitable? My use case is a fairly high query load, and each query ideally would return within about 20 seconds. The queries will use indexes to narrow down

Re: How to invoke getNaturalEndpoints with jconsole?

2011-05-11 Thread Maki Watanabe
Add a new faq: http://wiki.apache.org/cassandra/FAQ#jconsole_array_arg 2011/5/11 Nick Bailey n...@datastax.com: Yes. On Wed, May 11, 2011 at 8:25 AM, Maki Watanabe watanabe.m...@gmail.com wrote: Thanks, So my options are: 1. Write a thrift client code to call describe_ring with hashed

Re: How to invoke getNaturalEndpoints with jconsole?

2011-05-11 Thread Jonathan Ellis
Thanks! On Wed, May 11, 2011 at 10:20 AM, Maki Watanabe watanabe.m...@gmail.com wrote: Add a new faq: http://wiki.apache.org/cassandra/FAQ#jconsole_array_arg 2011/5/11 Nick Bailey n...@datastax.com: Yes. On Wed, May 11, 2011 at 8:25 AM, Maki Watanabe watanabe.m...@gmail.com wrote:

Re: Data types for cross language access

2011-05-11 Thread Luke Biddell
I wouldn't mind knowing how other people are approaching this problem too. On 11 May 2011 11:27, Oliver Dungey oliver.dun...@gmail.com wrote: I am currently working on a system with Cassandra that is written purely in Java. I know our end solution will require other languages to access the

Re: Index interval tuning

2011-05-11 Thread Jonathan Ellis
Close: the problem is we don't count *any* true positives *unless* cache is enabled. Fix attached to https://issues.apache.org/jira/browse/CASSANDRA-2637. On Wed, May 11, 2011 at 7:04 AM, Chris Burroughs chris.burrou...@gmail.com wrote: On 05/10/2011 10:24 PM, aaron morton wrote: What version

Re: Data types for cross language access

2011-05-11 Thread Alex Araujo
On 5/11/11 5:27 AM, Oliver Dungey wrote: I am currently working on a system with Cassandra that is written purely in Java. I know our end solution will require other languages to access the data in Cassandra (Python, C++ etc.). What is the best way to store data to ensure I can do this? Should

Re: Data types for cross language access

2011-05-11 Thread Nate McCall
You should have no problems with byte conversion consistencies. For the serialization test cases in Hector, we verify the most of the results with o.a.c.utils.ByteBufferUtil from Cassandra source. On Wed, May 11, 2011 at 10:23 AM, Luke Biddell luke.bidd...@gmail.com wrote: I wouldn't mind

Re: Data types for cross language access

2011-05-11 Thread Eric tamme
On Wed, May 11, 2011 at 10:23 AM, Luke Biddell luke.bidd...@gmail.com wrote: I wouldn't mind knowing how other people are approaching this problem too. On 11 May 2011 11:27, Oliver Dungey oliver.dun...@gmail.com wrote: I am currently working on a system with Cassandra that is written purely

Talk on DataStax Brisk on Monday at Cassandra London

2011-05-11 Thread Dave Gardner
Hi all, Any London-based people who are interested in Brisk should come along to the Cassandra London meetup on Monday. There will be a talk and live demo. http://www.meetup.com/Cassandra-London/events/16643691/ Dave

Choice of Index

2011-05-11 Thread Baskar Duraikannu
Hello - I am using 0.8 Beta 2 and have a CF containing COMPANY, ACCOUNTNUMBER and some account related data. I have index on both Company and AccountNumber. If I run a query - SELECT DATA FROM COMPANYCF WHERE COMPANY='XXX' AND ACCOUNTNUMBER = 'YYY' Even though ACCOUNTNUMBER based Index is a

Re: Online text search with Hadoop/Brisk

2011-05-11 Thread Edward Capriolo
On Wed, May 11, 2011 at 11:19 AM, Ben Scholl brsch...@gmail.com wrote: I keep reading that Hadoop/Brisk is not suitable for online querying, only for offline/batch processing. What exactly are the reasons it is unsuitable? My use case is a fairly high query load, and each query ideally would

Excessive allocation during hinted handoff

2011-05-11 Thread Gabriel Tataranu
Greetings, I'm experiencing some issues with 2 nodes (out of more than 10). Right after startup (Listening for thrift clients...) the nodes will create objects at high rate using all available CPU cores: INFO 18:13:15,350 GC for PS Scavenge: 292 ms, 494902976 reclaimed leaving 2024909864 used;

jsvc hangs shell

2011-05-11 Thread Anton Belyaev
Hello, I installed 0.7.5 to my Ubuntu 11.04 64 bit from package at deb http://www.apache.org/dist/cassandra/debian 07x main And I met really strange problem. Any shell command that requires Cassandra's jsvc command line (for example, ps -ef, or top with cmdline args) - just hangs. Using STRACE I

Re: Choice of Index

2011-05-11 Thread Jonathan Ellis
No, Cassandra uses statistics to see which index will result in less rows to check. On Wed, May 11, 2011 at 12:42 PM, Baskar Duraikannu baskar.duraikannu...@gmail.com wrote: Hello - I am using 0.8 Beta 2 and have a CF containing COMPANY, ACCOUNTNUMBER and some account related data.  I have

Re: Index interval tuning

2011-05-11 Thread aaron morton
Thanks A - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 12 May 2011, at 03:44, Jonathan Ellis wrote: Close: the problem is we don't count *any* true positives *unless* cache is enabled. Fix attached to

Re: jsvc hangs shell

2011-05-11 Thread jonathan . colby
We use the Java Service Wrapper from Tanuki Software and are very happy with it. It's a lot more robust than jsvc. http://wrapper.tanukisoftware.com/doc/english/download.jsp The free community version will be enough in most cases. Jon On May 11, 2011 10:30pm, Anton Belyaev

Re: jsvc hangs shell

2011-05-11 Thread Anton Belyaev
I guess it is not trivial to modify the package to make it use JSW instead of JSVC. I am still not sure the JSVC itself is a culprit. Maybe something is wrong in my setup. 2011/5/12 jonathan.co...@gmail.com: We use the Java Service Wrapper from Tanuki Software and are very happy with it. It's

Keyspace creation error on 0.8 beta2

2011-05-11 Thread Sameer Farooqui
When I run this from the Cassandra CMD-Line: create keyspace MyKeySpace with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = [{replication_factor:2}]; I get this error: Internal error processing system_add_keyspace My syntax is correct for creating the

network topology issue

2011-05-11 Thread Anurag Gujral
Hi All, I am testing network topology strategy in cassandra I am using two nodes , one node each in different data center. Since the nodes are in different dc I assigned token 0 to both the nodes. I added both the nodes as seeds in the cassandra.yaml and I am using properyfilesnitch

Re: Ec2 Stress Results

2011-05-11 Thread Alex Araujo
On 5/9/11 9:49 PM, Jonathan Ellis wrote: On Mon, May 9, 2011 at 5:58 PM, Alex Araujocassandra- How many replicas are you writing? Replication factor is 3. So you're actually spot on the predicted numbers: you're pushing 20k*3=60k raw rows/s across your 4 machines. You might get another 10%

Re: network topology issue

2011-05-11 Thread Sameer Farooqui
Anurag, The Cassandra ring spans datacenters, so you can't use token 0 on both nodes. Cassandra’s ring is from 0 to 2**127 in size. Try assigning one node the token of 0 and the second node 8.50705917 × 10^37 (input this as a single long number). To add a new keyspace in 0.8, run this from the

Re: Keyspace creation error on 0.8 beta2

2011-05-11 Thread Sameer Farooqui
FYI - creating the keyspace with the syntax below works in beta1, just not beta2. jeromatron on the IRC channel commented that it looks like the java classpath is using the wrong library dependency for commons lang in beta2. - Sameer On Wed, May 11, 2011 at 4:09 PM, Sameer Farooqui

Re: network topology issue

2011-05-11 Thread Narendra Sharma
My understanding is that the replication factor is for the entire ring. Even if you have 2 DCs the nodes are part of the same ring. What you get additionally from NTS is that you can specify how many replicas to place in each DC. So RF = 1 and DC1:1, DC2:1 looks incorrect to me. What is possible

Re: network topology issue

2011-05-11 Thread Sameer Farooqui
Yeah, Narendra is correct. If you have 2 nodes, one in each data center, use RF=2 and do reads and writes with either level ONE or QUORUM (which means 2 in this case). However, if you had 2 nodes in DC1 and 1 node in DC2, then you could use RF=3 and use LOCAL_QUORUM for reads and writes. For

Re: Ec2 Stress Results

2011-05-11 Thread Adrian Cockcroft
Hi Alex, This has been a useful thread, we've been comparing your numbers with our own tests. Why did you choose four big instances rather than more smaller ones? For $8/hr you get four m2.4xl with a total of 8 disks. For $8.16/hr you could have twelve m1.xl with a total of 48 disks, 3x disk

Re: Keyspace creation error on 0.8 beta2

2011-05-11 Thread Jeremy Hanna
I download a fresh 0.8 beta2 and create keyspaces fine - including the ones below. I don't know if there are relics of a previous install somewhere or something wonky about the classpath. You said that you might have /var/lib/cassandra data left over so one thing to try is starting fresh

Re: Finding big rows

2011-05-11 Thread aaron morton
Let me know if you get anywhere, I'm on there as aaron_morton but I'm also way over in New Zealand. If you are using your own client and writing data you cannot read back check that the byte encoding is always the same and that you are setting appropriate timestamps for every call. In the log

Re: Excessive allocation during hinted handoff

2011-05-11 Thread aaron morton
I'm assuming the two nodes are the ones receiving the HH after they were down. Are there a lot of hints collected while they are down ? you can check the HintedHandOffManager MBean in JConsole What does the TPStats look like on the nodes under pressure ? And how many nodes are delivering

Re: Ec2 Stress Results

2011-05-11 Thread Alex Araujo
Hey Adrian - Why did you choose four big instances rather than more smaller ones? Mostly to see the impact of additional CPUs on a write only load. The portion of the application we're migrating from MySQL is very write intensive. The other 8 core option was c1.xl with 7GB of RAM. I will

Re: Unable to add columns to empty row in Column family: Cassandra

2011-05-11 Thread aaron morton
How do you delete the data in the cli ? Is it a row delete e.g. del MyCF['my-key']; What client are you using the insert the row the second time ? e.g. custom thrift wrapper or pycassa How is the second read done, via the cli ? Does the same test work when you only use your app ?

Re: Excessive allocation during hinted handoff

2011-05-11 Thread Jonathan Ellis
Doesn't really look abnormal to me for a heavy write load situation which is what receiving hints is. On Wed, May 11, 2011 at 1:55 PM, Gabriel Tataranu gabr...@wajam.com wrote: Greetings, I'm experiencing some issues with 2 nodes (out of more than 10). Right after startup (Listening for