Re: Bootstrap hung
created https://issues.apache.org/jira/browse/CASSANDRA-794 for this On Fri, Feb 12, 2010 at 2:38 PM, ruslan usifov ruslan.usi...@gmail.com wrote: Also i have problem with StreamInitiateVerbHandler, the problem in PendingFile.getTargetFile, namely difference in slashes on win and unix, so i change PendingFile.java like this: public PendingFile(String targetFile, long expectedBytes, String table) { targetFile_ = targetFile.replaceAll((?:|/)+, /); expectedBytes_ = expectedBytes; table_ = table; ptr_ = 0; } public void setTargetFile(String file) { targetFile_ = file.replaceAll((?:|/)+, /);; }
Re: Bootstrap hung
are you mixing windows and unix machines in the same cluster? On Fri, Feb 12, 2010 at 2:38 PM, ruslan usifov ruslan.usi...@gmail.com wrote: Also i have problem with StreamInitiateVerbHandler, the problem in PendingFile.getTargetFile, namely difference in slashes on win and unix, so i change PendingFile.java like this: public PendingFile(String targetFile, long expectedBytes, String table) { targetFile_ = targetFile.replaceAll((?:|/)+, /); expectedBytes_ = expectedBytes; table_ = table; ptr_ = 0; } public void setTargetFile(String file) { targetFile_ = file.replaceAll((?:|/)+, /);; }
Re: Bootstrap hung
Ruslan, I think this indicates that SO_SNDBUF is too small on windows. Windows is the source, freebsd is the destination, correct?) I've created https://issues.apache.org/jira/browse/CASSANDRA-795 to track this. Can you apply the patch attached to it to see if it addresses the problem? Thanks. Gary 2010/2/15 ruslan usifov ruslan.usi...@gmail.com: ERROR [MESSAGE-STREAMING-POOL:1] 2010-02-12 19:08:25,500 DebuggableThreadPoolExecutor.java (line 80) Error in ThreadPoolExecutor java.lang.RuntimeException: java.io.IOException: Невозможно выполнить операцию на сокете, т.к. буфер слишком мал или очередь переполнена at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.IOException: Невозможно выполнить операцию на сокете, т.к. буфер слишком мал или очередь переполнена at sun.nio.ch.SocketDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:33) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104) at sun.nio.ch.IOUtil.write(IOUtil.java:60) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) at sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:449) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:520) at org.apache.cassandra.net.FileStreamTask.stream(FileStreamTask.java:95) at org.apache.cassandra.net.FileStreamTask.runMayThrow(FileStreamTask.java:63) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 3 more ERROR [MESSAGE-STREAMING-POOL:1] 2010-02-12 19:08:25,515 CassandraDaemon.java (line 78) Fatal exception in thread Thread[MESSAGE-STREAMING-POOL:1,5,main] java.lang.RuntimeException: java.io.IOException: Невозможно выполнить операцию на сокете, т.к. буфер слишком мал или очередь переполнена at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.IOException: Невозможно выполнить операцию на сокете, т.к. буфер слишком мал или очередь переполнена at sun.nio.ch.SocketDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:33) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104) at sun.nio.ch.IOUtil.write(IOUtil.java:60) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) at sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:449) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:520) at org.apache.cassandra.net.FileStreamTask.stream(FileStreamTask.java:95) at org.apache.cassandra.net.FileStreamTask.runMayThrow(FileStreamTask.java:63) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 3 more 2010/2/12 Jonathan Ellis jbel...@gmail.com Care to include a stack trace? Those are useful when reporting problems. On Fri, Feb 12, 2010 at 2:31 PM, ruslan usifov ruslan.usi...@gmail.com wrote: Yes
Re: Bootstrap hung
Yes, for test case 2010/2/15 Jonathan Ellis jbel...@gmail.com are you mixing windows and unix machines in the same cluster? On Fri, Feb 12, 2010 at 2:38 PM, ruslan usifov ruslan.usi...@gmail.com wrote: Also i have problem with StreamInitiateVerbHandler, the problem in PendingFile.getTargetFile, namely difference in slashes on win and unix, so i change PendingFile.java like this: public PendingFile(String targetFile, long expectedBytes, String table) { targetFile_ = targetFile.replaceAll((?:|/)+, /); expectedBytes_ = expectedBytes; table_ = table; ptr_ = 0; } public void setTargetFile(String file) { targetFile_ = file.replaceAll((?:|/)+, /);; }
Nodeprobe Not Working Properly
Hi, I just installed cassandra using the Debian package on two servers, _db1a_, and _db1b_. When I run the command _nodeprobe -host db1a ring_, the command only works on the server db1a. iptables is set to allow everything. I did also add the -Djava.rmi.server.hostname=192.168.1.13 to cassandra.in.sh as mentioned on this page [1]. What am I doing wrong? Any other recommendations? Thank you, Shahan - db1a db1a:~# iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination db1a:~# nodeprobe -host db1a ring Address Status Load Range Ring Token(bytes[eaaca3c3bd3caba3e14ee0f85d5cda8a]) 192.168.1.13 Up 3.04 KB Token(bytes[d1deccd61a6632f9040546c5fa57427e])|| db1b db1b:~# iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination db1b:~# nodeprobe -host db1a ring Error connecting to remote JMX agent! java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is: java.net.ConnectException: Connection refused at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619) at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216) at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202) at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:128) at javax.management.remote.rmi.RMIServerImpl_Stub.newClient(Unknown Source) at javax.management.remote.rmi.RMIConnector.getConnection(RMIConnector.java:2343) at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:296) at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:267) at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:153) at org.apache.cassandra.tools.NodeProbe.(NodeProbe.java:115) at org.apache.cassandra.tools.NodeProbe.main(NodeProbe.java:514) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:310) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:176) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:163) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:381) at java.net.Socket.connect(Socket.java:537) at java.net.Socket.connect(Socket.java:487) at java.net.Socket.(Socket.java:384) at java.net.Socket.(Socket.java:198) at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:40) at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:146) at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:613) ... 10 more Links: -- [1] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg00629.html
Re: Nodeprobe Not Working Properly
On Mon, Feb 15, 2010 at 1:13 PM, Shahan Khan cont...@shahan.me wrote: db1b:~# nodeprobe -host db1a ring Error connecting to remote JMX agent! java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is: This seems to indicate that db1a resolves as 127.0.0.1 on db1b, when it actually needs to resolve to the 192.168 address. Try passing the ip address as the host and it should work. -Brandon
Re: Nodeprobe Not Working Properly
I tried Brandon's suggestion, but am still getting the same error on the remote server. Any other suggestions? Is it possible that its a bug? Thanks, Shahan db1a = 192.168.1.13 db1b = 192.168.1.14 = db1a:~# nodeprobe -host 192.168.1.14 ring Error connecting to remote JMX agent! java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is: java.net.ConnectException: Connection refused at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619) at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216) at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202) at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:128) at javax.management.remote.rmi.RMIServerImpl_Stub.newClient(Unknown Source) at javax.management.remote.rmi.RMIConnector.getConnection(RMIConnector.java:2343) at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:296) at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:267) at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:153) at org.apache.cassandra.tools.NodeProbe.(NodeProbe.java:115) at org.apache.cassandra.tools.NodeProbe.main(NodeProbe.java:514) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:310) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:176) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:163) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:381) at java.net.Socket.connect(Socket.java:537) at java.net.Socket.connect(Socket.java:487) at java.net.Socket.(Socket.java:384) at java.net.Socket.(Socket.java:198) at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:40) at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:146) at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:613) ... 10 more === db1b:~# nodeprobe -host 192.168.1.14 info Token(bytes[eaaca3c3bd3caba3e14ee0f85d5cda8a]) Load : 4 KB Generation No : 1266260277 Uptime (seconds) : 9933 Heap Memory (MB) : 54.83 / 1016.13 db1b:~# nodeprobe -host 192.168.1.14 ring Address Status Load Range Ring Token(bytes[eaaca3c3bd3caba3e14ee0f85d5cda8a]) 192.168.1.13 Up 3.52 KB Token(bytes[d1deccd61a6632f9040546c5fa57427e])|| On Mon, 15 Feb 2010 13:19:42 -0600, Brandon Williams wrote: On Mon, Feb 15, 2010 at 1:13 PM, Shahan Khan wrote: db1b:~# nodeprobe -host db1a ring Error connecting to remote JMX agent! java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is:This seems to indicate that db1a resolves as 127.0.0.1 on db1b, when it actually needs to resolve to the 192.168 address. Try passing the ip address as the host and it should work. -Brandon Links: -- [1] mailto:cont...@shahan.me
RE: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?
It seems that read latency is sensitive to number of threads (or thrift clients): after reducing number of threads to 15 and read latency decreased to ~20ms. The other problem is: if I keep mixed write and read (e.g, 8 write threads plus 7 read threads) against the 2-nodes cluster continuously, the read latency will go up gradually (along with the size of Cassandra data file), and at the end it will become ~40ms (up from ~20ms) even with only 15 threads. During this process the data file grew from 1.6GB to over 3GB even if I kept writing the same key/values to Cassandra. It seems that Cassandra keeps appending to sstable data files and will only clean up them during node cleanup or compact (please correct me if this is incorrect). Here's my test settings: JVM xmx: 6GB KCF: 0.3 Memtable: 512MB. Number of records: 1 millon (payload is 1000 bytes) I used JMX and iostat to watch the cluster but can't find any clue for the increasing read latency issue: JVM memory, GC, CPU usage, tpstats and io saturation all seem to be clean. One exception is that the wait time in iostat goes up quickly once a while but is a small number for most of the time. Another thing I noticed is that JVM doesn't use more than 1GB of memory (out of the 6GB I specified for JVM) even if I set KCF to 0.3 and increased memtable size to 512MB. Did I miss anything here? How can I diagnose this kind of increasing read latency issue? Is there any performance tuning guide available? Thanks, -Weijun -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Sunday, February 14, 2010 6:22 PM To: cassandra-user@incubator.apache.org Subject: Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)? are you i/o bound? what is your on-disk data set size? what does iostats tell you? http://spyced.blogspot.com/2010/01/linux-performance-basics.html do you have a lot of pending compactions? (tpstats will tell you) have you increased KeysCachedFraction? On Sun, Feb 14, 2010 at 8:18 PM, Weijun Li weiju...@gmail.com wrote: Hello, I saw some Cassandra benchmark reports mentioning read latency that is less than 50ms or even 30ms. But my benchmark with 0.5 doesn't seem to support that. Here's my settings: Nodes: 2 machines. 2x2.5GHZ Xeon Quad Core (thus 8 cores), 8GB RAM ReplicationFactor=2 Partitioner=Random JVM Xmx: 4GB Memory table size: 512MB (haven't figured out how to enable binary memtable so I set both memtable number to 512mb) Flushing threads: 2-4 Payload: ~1000 bytes, 3 columns in one CF. Read/write time measure: get startTime right before each Java thrift call, transport objects are pre-created upon creation of each thread. The result shows that total write throughput is around 2000/sec (for 2 nodes in the cluster) which is not bad, and read throughput is just around 750/sec. However for each thread the average read latency is more than 100ms. I'm running 100 threads for the testing and each thread randomly pick a node for thrift call. So the read/sec of each thread is just around 7.5, meaning duration of each thrift call is 1000/7.5=133ms. Without replication the cluster write throughput is around 3300/s, and read throughput is around 1400/s, so the read latency is still around 70ms without replication. Is there anything wrong in my benchmark test? How can I achieve a reasonable read latency ( 30ms)? Thanks, -Weijun