RE: MD5 in the read path
Makes sense. Thanks! -Original Message- From: Joseph Lynch [mailto:joe.e.ly...@gmail.com] Sent: Wednesday, September 26, 2018 9:02 PM To: dev@cassandra.apache.org Subject: Re: MD5 in the read path > > Thank you all for the response. > For RandomPartitioner, MD5 is used to avoid collision. However, why is > it necessary for comparing data between different replicas? Is it not > feasible to use CRC for data comparison? > My understanding is that it is not necessary to use MD5 and we can switch out the message digest function as long as we have an upgrade path. I believe this is the goal of https://issues.apache.org/jira/browse/CASSANDRA-13292. -Joey - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
RE: MD5 in the read path
Thank you all for the response. For RandomPartitioner, MD5 is used to avoid collision. However, why is it necessary for comparing data between different replicas? Is it not feasible to use CRC for data comparison? Thanks, Preetika -Original Message- From: Elliott Sims [mailto:elli...@backblaze.com] Sent: Wednesday, September 26, 2018 7:58 PM To: dev@cassandra.apache.org Subject: Re: MD5 in the read path Would xxHash be large enough for digests? Looks like there's no 128-bit version yet, and it seems like 64 bits would be a bit short to avoid accidental collisions/matches. FarmHash128 or MetroHash128 might be a good choice. Not quite as fast as xxHash64, but not far off and still much, much faster than MD5 and somewhat faster than murmur3. May require some amount of benchmarking, since most of the performance comparisons are C and the performance of the Java implementations may vary drastically. Looks like https://issues.apache.org/jira/browse/CASSANDRA-13291 already switched to Guava, which probably makes Murmur3_128 easier to switch to than the rest, and it may be enough faster than MD5 to be beyond the point of diminishing returns anyways. (far from an expert, but this thread prompted me to go poking through hash options out of curiosity) On Wed, Sep 26, 2018 at 9:04 PM Joseph Lynch wrote: > Michael Kjellman and others (Jason, Sam, et al.) have already done a > lot of work in 4.0 to help change the use of MD5 to something more modern > [1][2]. > Also I cut a ticket a little while back about the significant > performance penalty of using MD5 for digests when doing quorum reads > of wide partitions [1]. Given the profiling that Michael has done and > the production profiling we did I think it's fair to say that changing > the digest from MD5 to > murmur3 or xxHash would lead to a noticeable performance improvement > for quorum reads, perhaps even something like a 2x throughput increase for > e.g. > wide partition workloads. > > The hard part is changing the digest hash without breaking older > versions, e.g. during a rolling restart you can't have one node give a > MD5 hash and the other give a xxHash hash as you'll end up with lots > of mismatches and read repairs ... so that would be the tricky part. I > believe that we just need to do what was done during the 3.0 storage > engine refactor (I can't remember the ticket but I'm pretty sure > Sylvain did the work) which checked the messaging version of the > destination node and sent the appropriate hash back. > > -Joey > > [1] https://issues.apache.org/jira/browse/CASSANDRA-13291 > [2] https://issues.apache.org/jira/browse/CASSANDRA-13292 > [3] https://issues.apache.org/jira/browse/CASSANDRA-14611 > > > On Wed, Sep 26, 2018 at 5:00 PM Elliott Sims > wrote: > > > They also don't matter for digests, as long as we're assuming all > > nodes > in > > the cluster are non-malicious (which is a pretty reasonable and > > probably necessary assumption). Or at least, deliberate collisions don't. > > Accidental collisions do, but 128 bits is sufficient to make that > > sufficiently unlikely (as in, chances are nobody will ever see a > > single > > collision) > > > > On Wed, Sep 26, 2018 at 7:58 PM Brandon Williams > wrote: > > > > > Collisions don't matter in the partitioner. > > > > > > On Wed, Sep 26, 2018, 6:53 PM Anirudh Kubatoor < > > anirudh.kubat...@gmail.com > > > > > > > wrote: > > > > > > > Isn't MD5 broken from a security standpoint? From wikipedia > > > > *"One basic requirement of any cryptographic hash function is > > > > that it should be computationally infeasible < > > > > > > > > > > https://en.wikipedia.org/wiki/Computational_complexity_theory#Intractability > > > > > > > > > to > > > > find two non-identical messages which hash to the same value. MD5 > fails > > > > this requirement catastrophically; such collisions > > > > <https://en.wikipedia.org/wiki/Collision_resistance> can be found in > > > > seconds on an ordinary home computer"* > > > > > > > > Regards, > > > > Anirudh > > > > > > > > On Wed, Sep 26, 2018 at 7:14 PM Jeff Jirsa wrote: > > > > > > > > > In some installations, it's used for hashing the partition key to > > find > > > > the > > > > > host ( RandomPartitioner ) > > > > > It's used for prepared statement IDs > > > > > It's used for hashing the data for reads to know if the data > matches > > on > > > > all >
MD5 in the read path
Hi all, I have a question about MD5 being used in the read path in Cassandra. I wanted to understand what exactly it is being used for and why not something like CRC is used which is less complex in comparison to MD5. Thanks, Preetika
partitioning and CRC
Hi all, I am trying to understand where exactly digests and checksums are being used in Cassandra. In my understanding, Murmur3 hashing is used with murmur3 partitioning scheme which is also the default configuration. CRC32 is used for data corruption and repair. MD5 is used for partitioning only when random partitioning is configured. Can you please correct if I'm missing something here? Thanks, Preetika
RE: question on running cassandra-dtests
Hi Ariel, Yes, it is Linux. I checked Cassandra paths in the env output, looks fine to me. Attached is the complete output of both commands: pip list and env. Will try keep-test-dir option as well. Thanks, Preetika -Original Message- From: Ariel Weisberg [mailto:ar...@weisberg.ws] Sent: Wednesday, March 28, 2018 2:38 PM To: dev@cassandra.apache.org Subject: Re: question on running cassandra-dtests Hi, Looks like you are running on Linux? From inside your virtualenv can you run pip list? Can you also give us the output of "env"? Looking at the error you got "NoSuchMethod" it suggests that the Cassandra you pointed to has inconsistent class files and jars or they change during the test. It's statically compiled so it shouldn't be able to build with methods that can't be resolved at runtime. It can still happen if the libraries provided at runtime aren't the same as the ones provided at compile time. It might help to run with --keep-test-dir and then check the test directory for Cassandra logs and review the classpath to make sure everything Cassandra is being loaded up with makes sense. Ariel On Wed, Mar 28, 2018, at 5:09 PM, Tyagi, Preetika wrote: > I'm able to setup and run dtest now, however, I do see a lot of failures. > > For example, I tried running "pytest > --cassandra-dir= home> > nodetool_test.py::TestNodetool::test_correct_dc_rack_in_nodetool_info" > and below is the snippet of errors: > > platform linux -- Python 3.5.2, pytest-3.5.0, py-1.5.3, pluggy-0.6.0 > rootdir: /home//cassandra-dtest, inifile: > pytest.ini > plugins: timeout-1.2.1, flaky-3.4.0 > collected 1 item > > > > > nodetool_test.py E > > > > [100%]FE [100%] > > == > = > ERRORS > == > == > __ > _ ERROR at teardown of > TestNodetool.test_correct_dc_rack_in_nodetool_info > __ > _ Unexpected error found in node logs (see > stdout for full details). > Errors: [ERROR [MessagingService-NettyOutbound-Thread-4-1] 2018-03-28 > 00:24:36,784 OutboundHandshakeHandler.java:209 - Failed to properly > handshake with peer 127.0.0.1:7000 (GOSSIP). Closing the channel. > java.lang.NoSuchMethodError: > org.apache.cassandra.net.async.OutboundConnectionIdentifier.connection > Address()Ljava/ > net/InetSocketAddress; > at > org.apache.cassandra.net.async.OutboundHandshakeHandler.channelActive( > OutboundHandshakeHandler.java:107) > ~[main/:na] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(Abs > tractChannelHandlerContext.java:213) > [netty-all-4.1.14.Final.jar:4.1.14.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(Abs > tractChannelHandlerContext.java:199) > [netty-all-4.1.14.Final.jar:4.1.14.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(Abstr > actChannelHandlerContext.java:192) > [netty-all-4.1.14.Final.jar:4.1.14.Final] > at io.netty.channel.DefaultChannelPipeline > $HeadContext.channelActive(DefaultChannelPipeline.java:1330) [netty- > all-4.1.14.Final.jar:4.1.14.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(Abs > tractChannelHandlerContext.java:213) > [netty-all-4.1.14.Final.jar:4.1.14.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(Abs > tractChannelHandlerContext.java:199) > [netty-all-4.1.14.Final.jar:4.1.14.Final] > at > io.netty.channel.DefaultChannelPipeline.fireChannelActive(DefaultChann > elPipeline.java:910) [netty-all-4.1.14.Final.jar:4.1.14.Final] > at io.netty.channel.epoll.AbstractEpollStreamChannel > $EpollStreamUnsafe.fulfillConnectPromise(AbstractEpollStreamChannel.ja > va:855) [netty-all-4.1.14.Final.jar:4.1.14.Final] > at io.netty.channel.epoll.AbstractEpollStreamChannel > $EpollStreamUnsafe.finishConnec
RE: question on running cassandra-dtests
nnelActive(AbstractChannelHandlerContext.java:199) [netty-all-4.1.14.Final.jar:4.1.14.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:192) [netty-all-4.1.14.Final.jar:4.1.14.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelActive(DefaultChannelPipeline.java:1330) [netty-all-4.1.14.Final.jar:4.1.14.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:213) [netty-all-4.1.14.Final.jar:4.1.14.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:199) [netty-all-4.1.14.Final.jar:4.1.14.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelActive(DefaultChannelPipeline.java:910) [netty-all-4.1.14.Final.jar:4.1.14.Final] at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.fulfillConnectPromise(AbstractEpollStreamChannel.java:855) [netty-all-4.1.14.Final.jar:4.1.14.Final] at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.finishConnect(AbstractEpollStreamChannel.java:888) [netty-all-4.1.14.Final.jar:4.1.14.Final] at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollOutReady(AbstractEpollStreamChannel.java:907) [netty-all-4.1.14.Final.jar:4.1.14.Final] at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:394) [netty-all-4.1.14.Final.jar:4.1.14.Final] at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:296) [netty-all-4.1.14.Final.jar:4.1.14.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-all-4.1.14.Final.jar:4.1.14.Final] at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) [netty-all-4.1.14.Final.jar:4.1.14.Final] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151], ERROR [MessagingService-NettyOutbound-Thread-4-7] 2018-03-28 00:24:37,063 OutboundHandshakeHandler.java:209 - Failed to properly handshake with peer 127.0.0.3:7000 (GOSSIP). Closing the channel. Can somebody help me figure out how I can run dtests successfully? Once I am able to do that, I will be able to proceed with the implementation of tests for the JIRA ticket I'm working on. Thanks, Preetika -Original Message- From: Ariel Weisberg [mailto:ar...@weisberg.ws] Sent: Tuesday, March 27, 2018 7:15 PM To: dev@cassandra.apache.org Subject: Re: question on running cassandra-dtests Hi, Great! Glad you were able to get up and running. The dtests can be tricky if you aren't already somewhat familiar with Python. Ariel On Mon, Mar 26, 2018, at 9:10 PM, Murukesh Mohanan wrote: > On Tue, Mar 27, 2018 at 6:47 Ariel Weisberg <ar...@weisberg.ws> wrote: > > > Hi, > > > > Are you deleting the venv before creating it? You shouldn't really > > need to use sudo for the virtualenv. That is going to make things > > potentially wonky. Naming it cassandra-dtest might also do something > > wonky if you have a cassandra-dtest directory already. I usually > > name it just venv and place it in the same subdir as the requirements file. > > > > Also running sudo is going to create a new shell and then exit the > > shell immediately so when you install the requirements it might be > > doing it not in the venv, but in whatever is going on inside the sudo shell. > > > Yep, looking at the logs, that's probably the issue. When activating a > venv (with `source .../bin/activate`), it sets environment variables > (`PATH`, `PYTHONHOME` etc.) so that the virtualenv's Python, pip are > used instead of the system Python and pip. sudo defaults to using a > clean PATH and resetting most of the user's environment, so the > effects of the venv are lost when running in sudo. > > > The advantage of virtualenv is not needing to mess with system > packages at > > all so sudo is inadvisable when creating, activating, and pip > > installing things. > > > > You might need to use pip3 instead of pip, but I suspect that in a > > correct venv pip is going to point to pip3. > > > > Ariel > > > > On Mon, Mar 26, 2018, at 5:31 PM, Tyagi, Preetika wrote: > > > Yes, that's correct. I followed README and ran all below steps to > > > create virtualenv. Attached is the output of all commands I ran > > > successfully except the last one i.e. pytest. > > > > > > Could you please let me know if you see anything wrong or missing? > > > > > > Thanks, > > > Preetika > > > > > > -Original Message- > > > From: Ariel Weisberg [mailto:ar...@weisberg.ws] > > > Sent: Monda
RE: question on running cassandra-dtests
Yes, that's correct. I followed README and ran all below steps to create virtualenv. Attached is the output of all commands I ran successfully except the last one i.e. pytest. Could you please let me know if you see anything wrong or missing? Thanks, Preetika -Original Message- From: Ariel Weisberg [mailto:ar...@weisberg.ws] Sent: Monday, March 26, 2018 9:32 AM To: dev@cassandra.apache.org Subject: Re: question on running cassandra-dtests Hi, Your environment is python 2.7 when it should be python 3. See: > File "/usr/local/lib/python2.7/dist-packages/_pytest/assertion/ > rewrite.py", line 213, in load_module Are you using virtualenv to create a python 3 environment to use with the tests? From README.md: **Note**: While virtualenv isn't strictly required, using virtualenv is almost always the quickest path to success as it provides common base setup across various configurations. 1. Install virtualenv: ``pip install virtualenv`` 2. Create a new virtualenv: ``virtualenv --python=python3 --no-site-packages ~/dtest`` 3. Switch/Activate the new virtualenv: ``source ~/dtest/bin/activate`` 4. Install remaining DTest Python dependencies: ``pip install -r /path/to/cassandra-dtest/requirements.txt`` Regards, Ariel On Mon, Mar 26, 2018, at 11:13 AM, Tyagi, Preetika wrote: > I was able to run requirements.txt with success. Below is the error I get: > > Traceback (most recent call last): > File "/usr/local/lib/python2.7/dist-packages/_pytest/config.py", > line 371, in _importconftest > mod = conftestpath.pyimport() > File "/usr/local/lib/python2.7/dist-packages/py/_path/local.py", > line 668, in pyimport > __import__(modname) > File "/usr/local/lib/python2.7/dist-packages/_pytest/assertion/ > rewrite.py", line 213, in load_module > py.builtin.exec_(co, mod.__dict__) > File "/usr/local/lib/python2.7/dist-packages/py/_builtin.py", line > 221, in exec_ > exec2(obj, globals, locals) > File "", line 7, in exec2 > File "/home//conftest.py", line 11, in > from itertools import zip_longest > ImportError: cannot import name zip_longest > ERROR: could not load /home//conftest.py > > Thanks, > Preetika > > -Original Message- > From: Murukesh Mohanan [mailto:murukesh.moha...@gmail.com] > Sent: Sunday, March 25, 2018 10:48 PM > To: dev@cassandra.apache.org > Subject: Re: question on running cassandra-dtests > > The complete error is needed. I get something similar if I hadn't run > `pip3 install -r requirements.txt`: > > Traceback (most recent call last): > File "/usr/local/lib/python3.6/site-packages/_pytest/config.py", > line 328, in _getconftestmodules > return self._path2confmods[path] > KeyError: local('/home/muru/dev/cassandra-dtest') > > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/usr/local/lib/python3.6/site-packages/_pytest/config.py", > line 359, in _importconftest > return self._conftestpath2mod[conftestpath] > KeyError: local('/home/muru/dev/cassandra-dtest/conftest.py') > > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/usr/local/lib/python3.6/site-packages/_pytest/config.py", > line 365, in _importconftest > mod = conftestpath.pyimport() > File "/usr/local/lib/python3.6/site-packages/py/_path/local.py", > line 668, in pyimport > __import__(modname) > File "/usr/local/lib/python3.6/site-packages/_pytest/assertion/ > rewrite.py", line 212, in load_module > py.builtin.exec_(co, mod.__dict__) > File "/home/muru/dev/cassandra-dtest/conftest.py", line 13, in > > from dtest import running_in_docker, > cleanup_docker_environment_before_test_execution > File "/home/muru/dev/cassandra-dtest/dtest.py", line 12, in > import cassandra > ModuleNotFoundError: No module named 'cassandra' > ERROR: could not load /home/muru/dev/cassandra-dtest/conftest.py > > Of course, `pip3 install -r requirements.txt` creates an `src` > directory with appropriate branches of ccm and cassandra-driver checked out. > > If you have run `pip3 install -r requirements.txt`, then something > else is wrong and we need the complete error log. > > On 2018/03/23 20:22:47, "Tyagi, Preetika" <preetika.ty...@intel.com> wrote: > > Hi All, > > > > I am trying to setup and run Cassandra-dtests so that I can write some > > tests for a JIRA ticket I have been working on. > > This is the repo I am using: > > https://github.com/apache/cassandra-dtest > > I followed all the instru
RE: question on running cassandra-dtests
I was able to run requirements.txt with success. Below is the error I get: Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/_pytest/config.py", line 371, in _importconftest mod = conftestpath.pyimport() File "/usr/local/lib/python2.7/dist-packages/py/_path/local.py", line 668, in pyimport __import__(modname) File "/usr/local/lib/python2.7/dist-packages/_pytest/assertion/rewrite.py", line 213, in load_module py.builtin.exec_(co, mod.__dict__) File "/usr/local/lib/python2.7/dist-packages/py/_builtin.py", line 221, in exec_ exec2(obj, globals, locals) File "", line 7, in exec2 File "/home//conftest.py", line 11, in from itertools import zip_longest ImportError: cannot import name zip_longest ERROR: could not load /home//conftest.py Thanks, Preetika -Original Message- From: Murukesh Mohanan [mailto:murukesh.moha...@gmail.com] Sent: Sunday, March 25, 2018 10:48 PM To: dev@cassandra.apache.org Subject: Re: question on running cassandra-dtests The complete error is needed. I get something similar if I hadn't run `pip3 install -r requirements.txt`: Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/_pytest/config.py", line 328, in _getconftestmodules return self._path2confmods[path] KeyError: local('/home/muru/dev/cassandra-dtest') During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/_pytest/config.py", line 359, in _importconftest return self._conftestpath2mod[conftestpath] KeyError: local('/home/muru/dev/cassandra-dtest/conftest.py') During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/_pytest/config.py", line 365, in _importconftest mod = conftestpath.pyimport() File "/usr/local/lib/python3.6/site-packages/py/_path/local.py", line 668, in pyimport __import__(modname) File "/usr/local/lib/python3.6/site-packages/_pytest/assertion/rewrite.py", line 212, in load_module py.builtin.exec_(co, mod.__dict__) File "/home/muru/dev/cassandra-dtest/conftest.py", line 13, in from dtest import running_in_docker, cleanup_docker_environment_before_test_execution File "/home/muru/dev/cassandra-dtest/dtest.py", line 12, in import cassandra ModuleNotFoundError: No module named 'cassandra' ERROR: could not load /home/muru/dev/cassandra-dtest/conftest.py Of course, `pip3 install -r requirements.txt` creates an `src` directory with appropriate branches of ccm and cassandra-driver checked out. If you have run `pip3 install -r requirements.txt`, then something else is wrong and we need the complete error log. On 2018/03/23 20:22:47, "Tyagi, Preetika" <preetika.ty...@intel.com> wrote: > Hi All, > > I am trying to setup and run Cassandra-dtests so that I can write some tests > for a JIRA ticket I have been working on. > This is the repo I am using: https://github.com/apache/cassandra-dtest > I followed all the instructions and installed dependencies. > > However, when I run "pytest -cassandra-dir= directory> > > It throws the error "could not load /conftest.py. > > I checked that this file (conftest.py) exists in Cassandra-dtest source root > and I'm not sure why it cannot find it. Does anyone have any idea what might > be going wrong here? > > I haven't used dtests before so I wonder if I'm missing something here. > > Thanks, > Preetika > > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
question on running cassandra-dtests
Hi All, I am trying to setup and run Cassandra-dtests so that I can write some tests for a JIRA ticket I have been working on. This is the repo I am using: https://github.com/apache/cassandra-dtest I followed all the instructions and installed dependencies. However, when I run "pytest -cassandra-dir= It throws the error "could not load /conftest.py. I checked that this file (conftest.py) exists in Cassandra-dtest source root and I'm not sure why it cannot find it. Does anyone have any idea what might be going wrong here? I haven't used dtests before so I wonder if I'm missing something here. Thanks, Preetika
RE: Use of OpOrder in memtable
Ah I see. That makes sense. And it doesn't have to anything with the read requests going on in parallel with write requests, right? I mean we read the data from memtable depending on whatever has been written into memtable so far and return it to the client (of course including SSTable read and timestamp comparison etc.) -Original Message- From: Benedict Elliott Smith [mailto:bened...@apache.org] Sent: Tuesday, February 13, 2018 2:25 PM To: dev@cassandra.apache.org Subject: Re: Use of OpOrder in memtable If you look closely, there can be multiple memtables extant at once. While all "new" writes are routed to the latest memtable, there may still be writes that have begun but not yet completed. The memtable cannot be flushed until any stragglers have completed, and some stragglers *may* still need to be routed to their designated memtable (if they had only just begun when the flush triggered). It helps avoid these race conditions on either side of the equation. On 13 February 2018 at 22:09, Tyagi, Preetika <preetika.ty...@intel.com> wrote: > Hi all, > > I'm trying to understand the behavior of memtable when writes/flush > operations are going on in parallel. > > In my understanding, once a memtable is full it is queued for flushing > and a new memtable is created for ongoing write operations. > However, I was looking at the code and it looks like the OpOrder class > is used (don't understand all details) to ensure the synchronization > between producers (writes) and consumers (batch flushes). > So I am a bit confused about when exactly it is needed. There will > always be only one latest memtable for write operations and all old > memtables are flushed so where this producer/consumer interaction on > the same memtable is needed? > > Thanks, > Preetika > >
Use of OpOrder in memtable
Hi all, I'm trying to understand the behavior of memtable when writes/flush operations are going on in parallel. In my understanding, once a memtable is full it is queued for flushing and a new memtable is created for ongoing write operations. However, I was looking at the code and it looks like the OpOrder class is used (don't understand all details) to ensure the synchronization between producers (writes) and consumers (batch flushes). So I am a bit confused about when exactly it is needed. There will always be only one latest memtable for write operations and all old memtables are flushed so where this producer/consumer interaction on the same memtable is needed? Thanks, Preetika
RE: range queries on partition key supported?
Thank you, Kurt. Just one more clarification. And, then entire partition on each node will be searched based on the > clustering key (i.e. "time" in this case). No. it will skip to the section of the partition with time = '12:00'. Cassandra should be smart enough to avoid reading the whole partition. Yeah, that seems to correct. I probably didn't phrase it correctly. Now let's assume a specific node is selected based on the token range and we need to look up for the data with time='12:00' within the partition which was obviously within token range. Now on this node, there may be more than one partitions (let's take two partitions for example) which qualify for this token range. In that case, these two partitions will need to be looked up to get the data with the given time = 12:00. So I'm wondering how these two partitions will be looked up on this node. How the request query would look like on this node to get these partitions? Does it make sense? Do you think I'm missing something? Thanks, Preetika -Original Message- From: kurt greaves [mailto:k...@instaclustr.com] Sent: Wednesday, January 31, 2018 9:46 PM To: dev@cassandra.apache.org Subject: Re: range queries on partition key supported? > > So that means more than one nodes can be selected to fulfill a range > query based on the token, correct? Yes. When doing a token range query Cassandra will need to send requests to any node that owns part of the token range requested. This could be just one set of replicas or more, depending on how your token ring is arranged. You could avoid querying multiple nodes by limiting the token() calls to be within one token range. And, then entire partition on each node will be searched based on the > clustering key (i.e. "time" in this case). No. it will skip to the section of the partition with time = '12:00'. Cassandra should be smart enough to avoid reading the whole partition. On 31 January 2018 at 06:57, Tyagi, Preetika <preetika.ty...@intel.com> wrote: > So that means more than one nodes can be selected to fulfill a range > query based on the token, correct? > > I was looking at this link: https://www.datastax.com/dev/ > blog/a-deep-look-to-the-cql-where-clause > > In the example query, > SELECT * FROM numberOfRequests > WHERE token(cluster, date) > token('cluster1', '2015-06-03') > AND token(cluster, date) <= token('cluster1', '2015-06-05') > AND time = '12:00' > > More than one nodes might get picked for this token based range query. > And, then entire partition on each node will be searched based on the > clustering key (i.e. "time" in this case). > Is my understanding correct? > > Thanks, > Preetika > > -Original Message- > From: J. D. Jordan [mailto:jeremiah.jor...@gmail.com] > Sent: Tuesday, January 30, 2018 10:13 AM > To: dev@cassandra.apache.org > Subject: Re: range queries on partition key supported? > > A range query can be performed on the token of a partition key, not on > the value. > > -Jeremiah > > > On Jan 30, 2018, at 12:21 PM, Tyagi, Preetika > > <preetika.ty...@intel.com> > wrote: > > > > Hi All, > > > > I have a quick question on Cassandra's behavior in case of partition > keys. I know that range queries are allowed in general, however, is it > also allowed on partition keys as well? The partition key is used as > an input to determine a node in a cluster, so I'm wondering how one > can possibly perform range query on that. > > > > Thanks, > > Preetika > > > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >
RE: create branch in my github account
Thank you Michael. I was able to create the branch and push my changes! :) Preetika -Original Message- From: Michael Shuler [mailto:mshu...@pbandjelly.org] On Behalf Of Michael Shuler Sent: Tuesday, January 30, 2018 2:04 PM To: dev@cassandra.apache.org Subject: Re: create branch in my github account On 01/30/2018 03:47 PM, Tyagi, Preetika wrote: > Hi all, > > I'm working on the JIRA ticket CASSANDRA-13981 and pushed a patch > yesterday, however, I have been suggested to create a branch in my > github account and then push all changes into that. The patch is too > big hence this seems to be a better approach. I haven't done it before > so wanted to ensure I do it correctly without messing things up :) > > > 1. On Cassandra GitHub: https://github.com/apache/cassandra, > click on "Fork" to create my own copy in my account. > > 2. Git clone on the forked branch above s/branch/repository/ - this is a new forked repo, not a branch > 3. Git checkout git checkout trunk # since 13981 appears to for 4.0 (trunk) # if you worked off some random sha, you may need to rebase on # trunk HEAD, otherwise it may not cleanly merge and that will be # the first patch review request. git checkout -b CASSANDRA-13981 # create a new branch > 4. Apply my patch > > 5. Git commit -m "" > > 6. Git push origin trunk git push origin CASSANDRA-13981 # push a new branch to your fork > Please let me know if you notice any issues. Thanks for your help! You could do this in your fork on the trunk repository, but it's probably better to create a new branch, so you can fetch changes from the upstream trunk branch and rebase your branch, if that is needed. It is very common to have a number of remotes configured in your local repository: one for your fork, one for the apache upstream, ones for other user's forks, etc. If you do your work directly in your trunk branch, you'll have conflicts when pulling in new commits from apache/cassandra trunk, for example. -- Michael - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
RE: range queries on partition key supported?
So that means more than one nodes can be selected to fulfill a range query based on the token, correct? I was looking at this link: https://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause In the example query, SELECT * FROM numberOfRequests WHERE token(cluster, date) > token('cluster1', '2015-06-03') AND token(cluster, date) <= token('cluster1', '2015-06-05') AND time = '12:00' More than one nodes might get picked for this token based range query. And, then entire partition on each node will be searched based on the clustering key (i.e. "time" in this case). Is my understanding correct? Thanks, Preetika -Original Message- From: J. D. Jordan [mailto:jeremiah.jor...@gmail.com] Sent: Tuesday, January 30, 2018 10:13 AM To: dev@cassandra.apache.org Subject: Re: range queries on partition key supported? A range query can be performed on the token of a partition key, not on the value. -Jeremiah > On Jan 30, 2018, at 12:21 PM, Tyagi, Preetika <preetika.ty...@intel.com> > wrote: > > Hi All, > > I have a quick question on Cassandra's behavior in case of partition keys. I > know that range queries are allowed in general, however, is it also allowed > on partition keys as well? The partition key is used as an input to determine > a node in a cluster, so I'm wondering how one can possibly perform range > query on that. > > Thanks, > Preetika > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
create branch in my github account
Hi all, I'm working on the JIRA ticket CASSANDRA-13981 and pushed a patch yesterday, however, I have been suggested to create a branch in my github account and then push all changes into that. The patch is too big hence this seems to be a better approach. I haven't done it before so wanted to ensure I do it correctly without messing things up :) 1. On Cassandra GitHub: https://github.com/apache/cassandra, click on "Fork" to create my own copy in my account. 2. Git clone on the forked branch above 3. Git checkout 4. Apply my patch 5. Git commit -m "" 6. Git push origin trunk Please let me know if you notice any issues. Thanks for your help! Preetika
range queries on partition key supported?
Hi All, I have a quick question on Cassandra's behavior in case of partition keys. I know that range queries are allowed in general, however, is it also allowed on partition keys as well? The partition key is used as an input to determine a node in a cluster, so I'm wondering how one can possibly perform range query on that. Thanks, Preetika
RE: simple vs complex cells
I agree. But this method is probably calculating the deletion time to determine the removal strategy for this complex column. However, when someone tries to read this complex column and there are multiple copies of it in more than one sstables, how the read request will determine which complex data is the latest and should be returned to the user? Preetika -Original Message- From: Minh Do [mailto:m...@netflix.com.INVALID] Sent: Tuesday, January 23, 2018 5:24 PM To: dev@cassandra.apache.org Subject: Re: simple vs complex cells Based on C* 3.x code base, I believe a complex column consists of many cells and each cell has its own timestamp. Then, there is a method to compute the maxTimestamp for a complex column: public long maxTimestamp() { long timestamp = complexDeletion.markedForDeleteAt(); for (Cell cell : this) timestamp = Math.max(timestamp, cell.timestamp()); return timestamp; } On Tue, Jan 23, 2018 at 4:22 PM, Tyagi, Preetika <preetika.ty...@intel.com> wrote: > Hi all, > > I'm trying to understand the behavior of simple and complex columns in > Cassandra. > I was looking at UnfilteredSerializer.java, serializeRowBody() checks > for a timestamp flag and then only it writes it. In case of > writeComplexColumn(), there is no timestamp being written. Also, as > per my understanding, a complex column contains several simple columns > each of which may or may not have a timestamp associated. > > My question is if there is no mandatory timestamp for either simple or > complex columns, how the data will be merged at the time of read > request based on the timestamp given that there can be more than one > copy of the same data in sstables? > > Also, is it allowed in cql queries to update one or more simple > columns within a complex columns? Or the entire complex is updated > whenever there is a update query? > > Thanks, > Preetika > >
simple vs complex cells
Hi all, I'm trying to understand the behavior of simple and complex columns in Cassandra. I was looking at UnfilteredSerializer.java, serializeRowBody() checks for a timestamp flag and then only it writes it. In case of writeComplexColumn(), there is no timestamp being written. Also, as per my understanding, a complex column contains several simple columns each of which may or may not have a timestamp associated. My question is if there is no mandatory timestamp for either simple or complex columns, how the data will be merged at the time of read request based on the timestamp given that there can be more than one copy of the same data in sstables? Also, is it allowed in cql queries to update one or more simple columns within a complex columns? Or the entire complex is updated whenever there is a update query? Thanks, Preetika
code formatting
Hi, I have a quick question on the code formatting for Cassandra using IntelliJ. I found a code formatter JAR here for Cassandra: https://wiki.apache.org/cassandra/CodeStyle?action=AttachFile=view=intellij-codestyle.jar Does someone know how it can be imported to IntelliJ-Cassandra project settings so that the code can be formatted automatically (using Ctr + Shift + F command)? Or is there a better way to do it? Thanks, Preetika
Question on submitting a patch
Hi all, When I click on "Submit Patch" option, it pops up a new screen where it asks for a bunch of details including Fix Version(s). Does the patch need to be synced up with the latest repo or I can just choose which version I worked with (which may not necessarily be the latest and hence one would be need to fetch that specific repo version in order to compile source with the patch)? Also, there is no option to upload the patch file on this screen. Can someone point out where to actually upload the patch? I haven't done before so might be asking dumb questions! :) Thanks, Preetika
RE: How to fetch replication factor of a given keyspace
Yeah, I tried doing it. The problem is when I call: Schema.instance.getKeyspaceInstance(keyspaceName) or getKeyspaceMetadata(); It returns null for all keyspace names. Is it expected? Thanks, Preetika -Original Message- From: Nate McCall [mailto:zznat...@gmail.com] Sent: Wednesday, December 20, 2017 6:09 PM To: dev <dev@cassandra.apache.org> Subject: Re: How to fetch replication factor of a given keyspace I think you want: Schema.instance.getKeyspaceMetadata There is a ReplicationParams nested under there which should have everything you need fully populated. On Thu, Dec 21, 2017 at 2:02 PM, Tyagi, Preetika <preetika.ty...@intel.com> wrote: > Hi, > > If I need to get the replication factor of a given keyspace in nodetool > commands (e.g. status), how can I do that? I'm trying to figure it out for a > JIRA item I'm working on. > > I tried using the below: > Keyspace keyspace = Keyspace.open(keyspaceName); Int rf = > keyspace.getReplicationStrategy().getReplicationFactor() > > However, it runs into some issues since internally something doesn't get > initialized while looking up keyspaces/metatadata. > > Any ideas on how I can approach it differently? > > Thanks, > Preetika > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
How to fetch replication factor of a given keyspace
Hi, If I need to get the replication factor of a given keyspace in nodetool commands (e.g. status), how can I do that? I'm trying to figure it out for a JIRA item I'm working on. I tried using the below: Keyspace keyspace = Keyspace.open(keyspaceName); Int rf = keyspace.getReplicationStrategy().getReplicationFactor() However, it runs into some issues since internally something doesn't get initialized while looking up keyspaces/metatadata. Any ideas on how I can approach it differently? Thanks, Preetika
how to build nodetool source
Hi all, I might be missing something very simple here but it seems I cannot find a way to build tools/nodetool/* source files correctly in my dev set up. For example, when I make a simple code change in line 105 (System.out.println("Status=Up/Down");" in Status.java to print something else and run ant build, it doesn't get printed out when I run "nodetool status" command next time. It still prints the old message. I'm using IntelliJ for development and building the code. I would appreciate any help! Thanks, Preetika
RE: Why Cassandra unit test run skips some of it?
Yeah-I actually figured it out. CDC tests were ignored as Josh mentioned since it was disabled. I looked at the code and other skipped tests were also marked ignored. Thank you both for the response! Preetika -Original Message- From: Jeff Jirsa [mailto:jji...@gmail.com] Sent: Thursday, September 21, 2017 11:03 AM To: dev@cassandra.apache.org Subject: Re: Why Cassandra unit test run skips some of it? There’s also at least one test we skip in circleci where we know the container memory is insufficient for the test - based on environment variables in that case. -- Jeff Jirsa > On Sep 21, 2017, at 10:14 AM, Josh McKenzie <jmcken...@apache.org> wrote: > > It at least skips the CDC* tests unless you use the test-cdc target, as it > needs some different .yaml configurations so runs as a separate job. Not > sure about any other skips. > > On Thu, Sep 21, 2017 at 12:29 PM, Tyagi, Preetika <preetika.ty...@intel.com> > wrote: > >> Hi all, >> >> I downloaded and built the Cassandra project from GitHub and ran all unit >> tests by running the below command: >> >> ant test -Dtest.runners=4 >> >> When it finished, I saw >99% success rate, however, it also showed some >> number under "Skipped" tests as well. Does someone know why would it skip >> some tests? >> >> Thanks, >> Preetika >> - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Why Cassandra unit test run skips some of it?
Hi all, I downloaded and built the Cassandra project from GitHub and ran all unit tests by running the below command: ant test -Dtest.runners=4 When it finished, I saw >99% success rate, however, it also showed some number under "Skipped" tests as well. Does someone know why would it skip some tests? Thanks, Preetika
reclaim after memtable flush
Hi, I'm trying to understand how Regions are being allocated and deallocated in Memtable. I can see that the same region is being used to allocate more size until the max region limit is hit. However, once the max limit is reached, the current region is set to null and eventually new Region gets allocated. My question is if it possible for this filled region (which can be set to null) to have some valid data (from current memtable) which hasn't been flushed yet? How each memtable is mapped to these regions? Thanks, Preetika
RE: question on assigning JIRA ticket
Thank you, Jeff. It was really helpful! Also, do I need to request an access in order to be able to assign an issue to myself? I cannot find an option to do that when I'm logged in. Preetika -Original Message- From: Jeff Jirsa [mailto:jji...@gmail.com] Sent: Tuesday, September 19, 2017 9:50 AM To: Cassandra DEV <dev@cassandra.apache.org> Subject: Re: question on assigning JIRA ticket If it's created by someone else but not assigned, you can assign it to yourself and begin work If it's created by someone else and assigned to someone else, you should post on the ticket and ask if they mind if you work on it instead. Sometimes people assign and then never work on it, and they won't mind if you take it. Sometimes they'll have started but hit a road block, and may be able to give you some code to start with. Just ask before you change the assignee. - Jeff On Tue, Sep 19, 2017 at 9:34 AM, Tyagi, Preetika <preetika.ty...@intel.com> wrote: > Hi all, > > I'm trying to figure out different ways in which one can contribute to > Cassandra. I know one can create a ticket and assign it to himself to > work on it. However, is it also allowed to assign and work on an > already existing ticket created by someone else? > > Thanks, > Preetika > >
question on assigning JIRA ticket
Hi all, I'm trying to figure out different ways in which one can contribute to Cassandra. I know one can create a ticket and assign it to himself to work on it. However, is it also allowed to assign and work on an already existing ticket created by someone else? Thanks, Preetika
RE: question on the code formatter
Thank you for the info! I tried it and it worked as expected. -Original Message- From: Murukesh Mohanan [mailto:murukesh.moha...@gmail.com] Sent: Thursday, September 14, 2017 11:10 PM To: dev@cassandra.apache.org Subject: Re: question on the code formatter The wiki seems to be outdated. See https://github.com/apache/cassandra/blob/trunk/doc/source/development/ide.rst : > The project generated by the ant task ``generate-idea-files`` contains nearly everything > you need to debug Cassandra and execute unit tests. > > * Run/debug defaults for JUnit > * Run/debug configuration for Cassandra daemon > * License header for Java source files > * Cassandra code style > * Inspections You can just run `generate-idea-files` and then open the project in IDEA. Code style settings should be automatically picked up by IDEA. On Fri, 15 Sep 2017 at 14:46 Tyagi, Preetika <preetika.ty...@intel.com> wrote: > Hi all, > > I was trying to configure the Cassandra code formatter and downloaded > IntelliJ-codestyle.jar from this link: > https://wiki.apache.org/cassandra/CodeStyle > > After extracting this JAR, I was able to import > codestyle/Default_1_.xml into my project and formatting seemed to work. > > However, I'm wondering what options/code.style.schemes.xml file is > exactly used for? Could anyone please give me an idea if I need to > configure this as well? > > Thanks, > Preetika > > > -- Murukesh Mohanan, Yahoo! Japan
RE: Proposal: Closing old, unable-to-repro JIRAs
+ 1 This is a good idea. -Original Message- From: beggles...@apple.com [mailto:beggles...@apple.com] Sent: Friday, September 15, 2017 8:29 AM To: dev@cassandra.apache.org Subject: Re: Proposal: Closing old, unable-to-repro JIRAs +1 to that On September 14, 2017 at 4:50:54 PM, Jeff Jirsa (jji...@gmail.com) wrote: There's a number of JIRAs that are old - sometimes very old - that represent bugs that either don't exist in modern versions, or don't have sufficient information for us to repro, but the reporter has gone away. Would anyone be offended if I start tagging these with the label 'UnableToRepro' or 'Unresponsive' and start a 30 day timer to close them? Anyone have a better suggestion? - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
question on the code formatter
Hi all, I was trying to configure the Cassandra code formatter and downloaded IntelliJ-codestyle.jar from this link: https://wiki.apache.org/cassandra/CodeStyle After extracting this JAR, I was able to import codestyle/Default_1_.xml into my project and formatting seemed to work. However, I'm wondering what options/code.style.schemes.xml file is exactly used for? Could anyone please give me an idea if I need to configure this as well? Thanks, Preetika