RE: MD5 in the read path

2018-09-26 Thread Tyagi, Preetika
Makes sense. Thanks!

-Original Message-
From: Joseph Lynch [mailto:joe.e.ly...@gmail.com] 
Sent: Wednesday, September 26, 2018 9:02 PM
To: dev@cassandra.apache.org
Subject: Re: MD5 in the read path

>
> Thank you all for the response.
> For RandomPartitioner, MD5 is used to avoid collision. However, why is 
> it necessary for comparing data between different replicas? Is it not 
> feasible to use CRC for data comparison?
>
My understanding is that it is not necessary to use MD5 and we can switch out 
the message digest function as long as we have an upgrade path. I believe this 
is the goal of https://issues.apache.org/jira/browse/CASSANDRA-13292.

-Joey

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



RE: MD5 in the read path

2018-09-26 Thread Tyagi, Preetika
Thank you all for the response.
For RandomPartitioner, MD5 is used to avoid collision. However, why is it 
necessary for comparing data between different replicas? Is it not feasible to 
use CRC for data comparison?

Thanks,
Preetika

-Original Message-
From: Elliott Sims [mailto:elli...@backblaze.com] 
Sent: Wednesday, September 26, 2018 7:58 PM
To: dev@cassandra.apache.org
Subject: Re: MD5 in the read path

Would xxHash be large enough for digests?  Looks like there's no 128-bit 
version yet, and it seems like 64 bits would be a bit short to avoid accidental 
collisions/matches.  FarmHash128 or MetroHash128 might be a good choice.  Not 
quite as fast as xxHash64, but not far off and still much, much faster than MD5 
and somewhat faster than murmur3. May require some amount of benchmarking, 
since most of the performance comparisons are C and the performance of the Java 
implementations may vary drastically.

Looks like https://issues.apache.org/jira/browse/CASSANDRA-13291 already 
switched to Guava, which probably makes Murmur3_128 easier to switch to than 
the rest, and it may be enough faster than MD5 to be beyond the point of 
diminishing returns anyways.

 (far from an expert, but this thread prompted me to go poking through hash 
options out of curiosity)

On Wed, Sep 26, 2018 at 9:04 PM Joseph Lynch  wrote:

> Michael Kjellman and others (Jason, Sam, et al.) have already done a 
> lot of work in 4.0 to help change the use of MD5 to something more modern 
> [1][2].
> Also I cut a ticket a little while back about the significant 
> performance penalty of using MD5 for digests when doing quorum reads 
> of wide partitions [1]. Given the profiling that Michael has done and 
> the production profiling we did I think it's fair to say that changing 
> the digest from MD5 to
> murmur3 or xxHash would lead to a noticeable performance improvement 
> for quorum reads, perhaps even something like a 2x throughput increase for 
> e.g.
> wide partition workloads.
>
> The hard part is changing the digest hash without breaking older 
> versions, e.g. during a rolling restart you can't have one node give a 
> MD5 hash and the other give a xxHash hash as you'll end up with lots 
> of mismatches and read repairs ... so that would be the tricky part. I 
> believe that we just need to do what was done during the 3.0 storage 
> engine refactor (I can't remember the ticket but I'm pretty sure 
> Sylvain did the work) which checked the messaging version of the 
> destination node and sent the appropriate hash back.
>
> -Joey
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-13291
> [2] https://issues.apache.org/jira/browse/CASSANDRA-13292
> [3] https://issues.apache.org/jira/browse/CASSANDRA-14611
>
>
> On Wed, Sep 26, 2018 at 5:00 PM Elliott Sims 
> wrote:
>
> > They also don't matter for digests, as long as we're assuming all 
> > nodes
> in
> > the cluster are non-malicious (which is a pretty reasonable and 
> > probably necessary assumption).  Or at least, deliberate collisions don't.
> > Accidental collisions do, but 128 bits is sufficient to make that 
> > sufficiently unlikely (as in, chances are nobody will ever see a 
> > single
> > collision)
> >
> > On Wed, Sep 26, 2018 at 7:58 PM Brandon Williams 
> wrote:
> >
> > > Collisions don't matter in the partitioner.
> > >
> > > On Wed, Sep 26, 2018, 6:53 PM Anirudh Kubatoor <
> > anirudh.kubat...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Isn't MD5 broken from a security standpoint? From wikipedia 
> > > > *"One basic requirement of any cryptographic hash function is 
> > > > that it should be computationally infeasible <
> > > >
> > >
> >
> https://en.wikipedia.org/wiki/Computational_complexity_theory#Intractability
> > > > >
> > > > to
> > > > find two non-identical messages which hash to the same value. MD5
> fails
> > > > this requirement catastrophically; such collisions
> > > > <https://en.wikipedia.org/wiki/Collision_resistance> can be found in
> > > > seconds on an ordinary home computer"*
> > > >
> > > > Regards,
> > > > Anirudh
> > > >
> > > > On Wed, Sep 26, 2018 at 7:14 PM Jeff Jirsa  wrote:
> > > >
> > > > > In some installations, it's used for hashing the partition key to
> > find
> > > > the
> > > > > host ( RandomPartitioner )
> > > > > It's used for prepared statement IDs
> > > > > It's used for hashing the data for reads to know if the data
> matches
> > on
> > > > all
>

MD5 in the read path

2018-09-26 Thread Tyagi, Preetika
Hi all,

I have a question about MD5 being used in the read path in Cassandra.
I wanted to understand what exactly it is being used for and why not something 
like CRC is used which is less complex in comparison to MD5.

Thanks,
Preetika



partitioning and CRC

2018-09-13 Thread Tyagi, Preetika
Hi all,

I am trying to understand where exactly digests and checksums are being used in 
Cassandra.
In my understanding,
Murmur3 hashing is used with murmur3 partitioning scheme which is also the 
default configuration.
CRC32 is used for data corruption and repair.
MD5 is used for partitioning only when random partitioning is configured.

Can you please correct if I'm missing something here?

Thanks,
Preetika



RE: question on running cassandra-dtests

2018-03-28 Thread Tyagi, Preetika
Hi Ariel,

Yes, it is Linux. I checked Cassandra paths in the env output, looks fine to 
me. Attached is the complete output of both commands: pip list and env.

Will try keep-test-dir option as well.

Thanks,
Preetika

-Original Message-
From: Ariel Weisberg [mailto:ar...@weisberg.ws] 
Sent: Wednesday, March 28, 2018 2:38 PM
To: dev@cassandra.apache.org
Subject: Re: question on running cassandra-dtests

Hi,

Looks like you are running on Linux?

From inside your virtualenv can you run pip list? Can you also give us the 
output of "env"?

Looking at the error you got "NoSuchMethod" it suggests that the Cassandra you 
pointed to has inconsistent class files and jars or they change during the 
test.  It's statically compiled so it shouldn't be able to build with methods 
that can't be resolved at runtime. It can still happen if the libraries 
provided at runtime aren't the same as the ones provided at compile time.

It might help to run with --keep-test-dir and then check the test directory for 
Cassandra logs and review the classpath to make sure everything Cassandra is 
being loaded up with makes sense.

Ariel


On Wed, Mar 28, 2018, at 5:09 PM, Tyagi, Preetika wrote:
> I'm able to setup and run dtest now, however, I do see a lot of failures.
> 
> For example, I tried running "pytest 
> --cassandra-dir= home> 
> nodetool_test.py::TestNodetool::test_correct_dc_rack_in_nodetool_info"
> and below is the snippet of errors:
> 
> platform linux -- Python 3.5.2, pytest-3.5.0, py-1.5.3, pluggy-0.6.0
> rootdir: /home//cassandra-dtest, inifile: 
> pytest.ini
> plugins: timeout-1.2.1, flaky-3.4.0
> collected 1 item  
>   
>   
> 
> 
> nodetool_test.py E
>   
>   
>   
> [100%]FE [100%]
> 
> ==
> =
> ERRORS
> ==
> ==
> __
> _ ERROR at teardown of 
> TestNodetool.test_correct_dc_rack_in_nodetool_info
> __
> _ Unexpected error found in node logs (see 
> stdout for full details).
> Errors: [ERROR [MessagingService-NettyOutbound-Thread-4-1] 2018-03-28
> 00:24:36,784 OutboundHandshakeHandler.java:209 - Failed to properly 
> handshake with peer 127.0.0.1:7000 (GOSSIP). Closing the channel.
> java.lang.NoSuchMethodError: 
> org.apache.cassandra.net.async.OutboundConnectionIdentifier.connection
> Address()Ljava/
> net/InetSocketAddress;
>   at
> org.apache.cassandra.net.async.OutboundHandshakeHandler.channelActive(
> OutboundHandshakeHandler.java:107)
> ~[main/:na]
>   at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(Abs
> tractChannelHandlerContext.java:213)
> [netty-all-4.1.14.Final.jar:4.1.14.Final]
>   at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(Abs
> tractChannelHandlerContext.java:199)
> [netty-all-4.1.14.Final.jar:4.1.14.Final]
>   at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(Abstr
> actChannelHandlerContext.java:192)
> [netty-all-4.1.14.Final.jar:4.1.14.Final]
>   at io.netty.channel.DefaultChannelPipeline
> $HeadContext.channelActive(DefaultChannelPipeline.java:1330) [netty- 
> all-4.1.14.Final.jar:4.1.14.Final]
>   at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(Abs
> tractChannelHandlerContext.java:213)
> [netty-all-4.1.14.Final.jar:4.1.14.Final]
>   at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(Abs
> tractChannelHandlerContext.java:199)
> [netty-all-4.1.14.Final.jar:4.1.14.Final]
>   at
> io.netty.channel.DefaultChannelPipeline.fireChannelActive(DefaultChann
> elPipeline.java:910) [netty-all-4.1.14.Final.jar:4.1.14.Final]
>   at io.netty.channel.epoll.AbstractEpollStreamChannel
> $EpollStreamUnsafe.fulfillConnectPromise(AbstractEpollStreamChannel.ja
> va:855) [netty-all-4.1.14.Final.jar:4.1.14.Final]
>   at io.netty.channel.epoll.AbstractEpollStreamChannel
> $EpollStreamUnsafe.finishConnec

RE: question on running cassandra-dtests

2018-03-28 Thread Tyagi, Preetika
nnelActive(AbstractChannelHandlerContext.java:199)
 [netty-all-4.1.14.Final.jar:4.1.14.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:192)
 [netty-all-4.1.14.Final.jar:4.1.14.Final]
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelActive(DefaultChannelPipeline.java:1330)
 [netty-all-4.1.14.Final.jar:4.1.14.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:213)
 [netty-all-4.1.14.Final.jar:4.1.14.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:199)
 [netty-all-4.1.14.Final.jar:4.1.14.Final]
at 
io.netty.channel.DefaultChannelPipeline.fireChannelActive(DefaultChannelPipeline.java:910)
 [netty-all-4.1.14.Final.jar:4.1.14.Final]
at 
io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.fulfillConnectPromise(AbstractEpollStreamChannel.java:855)
 [netty-all-4.1.14.Final.jar:4.1.14.Final]
at 
io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.finishConnect(AbstractEpollStreamChannel.java:888)
 [netty-all-4.1.14.Final.jar:4.1.14.Final]
at 
io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollOutReady(AbstractEpollStreamChannel.java:907)
 [netty-all-4.1.14.Final.jar:4.1.14.Final]
at 
io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:394) 
[netty-all-4.1.14.Final.jar:4.1.14.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:296) 
[netty-all-4.1.14.Final.jar:4.1.14.Final]
at 
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
 [netty-all-4.1.14.Final.jar:4.1.14.Final]
at 
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
 [netty-all-4.1.14.Final.jar:4.1.14.Final]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151], ERROR 
[MessagingService-NettyOutbound-Thread-4-7] 2018-03-28 00:24:37,063 
OutboundHandshakeHandler.java:209 - Failed to properly handshake with peer 
127.0.0.3:7000 (GOSSIP). Closing the channel.

Can somebody help me figure out how I can run dtests successfully? Once I am 
able to do that, I will be able to proceed with the implementation of tests for 
the JIRA ticket I'm working on.

Thanks,
Preetika

-Original Message-
From: Ariel Weisberg [mailto:ar...@weisberg.ws] 
Sent: Tuesday, March 27, 2018 7:15 PM
To: dev@cassandra.apache.org
Subject: Re: question on running cassandra-dtests

Hi,

Great! Glad you were able to get up and running. The dtests can be tricky if 
you aren't already somewhat familiar with Python.

Ariel

On Mon, Mar 26, 2018, at 9:10 PM, Murukesh Mohanan wrote:
> On Tue, Mar 27, 2018 at 6:47 Ariel Weisberg <ar...@weisberg.ws> wrote:
> 
> > Hi,
> >
> > Are you deleting the venv before creating it? You shouldn't really 
> > need to use sudo for the virtualenv. That is going to make things 
> > potentially wonky. Naming it cassandra-dtest might also do something 
> > wonky if you have a cassandra-dtest directory already. I usually 
> > name it just venv and place it in the same subdir as the requirements file.
> >
> > Also running sudo is going to create a new shell and then exit the 
> > shell immediately so when you install the requirements it might be 
> > doing it not in the venv, but in whatever is going on inside the sudo shell.
> 
> 
> Yep, looking at the logs, that's probably the issue. When activating a 
> venv (with `source .../bin/activate`), it sets environment variables 
> (`PATH`, `PYTHONHOME` etc.) so that the virtualenv's Python, pip are 
> used instead of the system Python and pip. sudo defaults to using a 
> clean PATH and resetting most of the user's environment, so the 
> effects of the venv are lost when running in sudo.
> 
> 
> The advantage of virtualenv is not needing to mess with system 
> packages at
> > all so sudo is inadvisable when creating, activating, and pip 
> > installing things.
> >
> > You might need to use pip3 instead of pip, but I suspect that in a 
> > correct venv pip is going to point to pip3.
> >
> > Ariel
> >
> > On Mon, Mar 26, 2018, at 5:31 PM, Tyagi, Preetika wrote:
> > > Yes, that's correct. I followed README and ran all below steps to 
> > > create virtualenv. Attached is the output of all commands I ran 
> > > successfully except the last one i.e. pytest.
> > >
> > > Could you please let me know if you see anything wrong or missing?
> > >
> > > Thanks,
> > > Preetika
> > >
> > > -Original Message-
> > > From: Ariel Weisberg [mailto:ar...@weisberg.ws]
> > > Sent: Monda

RE: question on running cassandra-dtests

2018-03-26 Thread Tyagi, Preetika
Yes, that's correct. I followed README and ran all below steps to create 
virtualenv. Attached is the output of all commands I ran successfully except 
the last one i.e. pytest.

Could you please let me know if you see anything wrong or missing?

Thanks,
Preetika

-Original Message-
From: Ariel Weisberg [mailto:ar...@weisberg.ws] 
Sent: Monday, March 26, 2018 9:32 AM
To: dev@cassandra.apache.org
Subject: Re: question on running cassandra-dtests

Hi,

Your environment is python 2.7 when it should be python 3.
See:
>   File "/usr/local/lib/python2.7/dist-packages/_pytest/assertion/
> rewrite.py", line 213, in load_module

Are you using virtualenv to create a python 3 environment to use with the tests?

From README.md:

**Note**: While virtualenv isn't strictly required, using virtualenv is almost 
always the quickest path to success as it provides common base setup across 
various configurations.

1. Install virtualenv: ``pip install virtualenv`` 2. Create a new virtualenv: 
``virtualenv --python=python3 --no-site-packages ~/dtest`` 3. Switch/Activate 
the new virtualenv: ``source ~/dtest/bin/activate`` 4. Install remaining DTest 
Python dependencies: ``pip install -r 
/path/to/cassandra-dtest/requirements.txt``

Regards,
Ariel

On Mon, Mar 26, 2018, at 11:13 AM, Tyagi, Preetika wrote:
> I was able to run requirements.txt with success. Below is the error I get:
> 
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/dist-packages/_pytest/config.py", 
> line 371, in _importconftest
> mod = conftestpath.pyimport()
>   File "/usr/local/lib/python2.7/dist-packages/py/_path/local.py", 
> line 668, in pyimport
> __import__(modname)
>   File "/usr/local/lib/python2.7/dist-packages/_pytest/assertion/
> rewrite.py", line 213, in load_module
> py.builtin.exec_(co, mod.__dict__)
>   File "/usr/local/lib/python2.7/dist-packages/py/_builtin.py", line 
> 221, in exec_
> exec2(obj, globals, locals)
>   File "", line 7, in exec2
>   File "/home//conftest.py", line 11, in 
> from itertools import zip_longest
> ImportError: cannot import name zip_longest
> ERROR: could not load /home//conftest.py
> 
> Thanks,
> Preetika
> 
> -Original Message-
> From: Murukesh Mohanan [mailto:murukesh.moha...@gmail.com]
> Sent: Sunday, March 25, 2018 10:48 PM
> To: dev@cassandra.apache.org
> Subject: Re: question on running cassandra-dtests
> 
> The complete error is needed. I get something similar if I hadn't run
> `pip3 install -r requirements.txt`:
> 
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/_pytest/config.py", 
> line 328, in _getconftestmodules
> return self._path2confmods[path]
> KeyError: local('/home/muru/dev/cassandra-dtest')
> 
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/_pytest/config.py", 
> line 359, in _importconftest
> return self._conftestpath2mod[conftestpath]
> KeyError: local('/home/muru/dev/cassandra-dtest/conftest.py')
> 
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/_pytest/config.py", 
> line 365, in _importconftest
> mod = conftestpath.pyimport()
>   File "/usr/local/lib/python3.6/site-packages/py/_path/local.py", 
> line 668, in pyimport
> __import__(modname)
>   File "/usr/local/lib/python3.6/site-packages/_pytest/assertion/
> rewrite.py", line 212, in load_module
> py.builtin.exec_(co, mod.__dict__)
>   File "/home/muru/dev/cassandra-dtest/conftest.py", line 13, in 
> 
> from dtest import running_in_docker, 
> cleanup_docker_environment_before_test_execution
>   File "/home/muru/dev/cassandra-dtest/dtest.py", line 12, in 
> import cassandra
> ModuleNotFoundError: No module named 'cassandra'
> ERROR: could not load /home/muru/dev/cassandra-dtest/conftest.py
> 
> Of course, `pip3 install -r requirements.txt` creates an `src` 
> directory with appropriate branches of ccm and cassandra-driver checked out.
> 
> If you have run `pip3 install -r requirements.txt`, then something 
> else is wrong and we need the complete error log.
> 
> On 2018/03/23 20:22:47, "Tyagi, Preetika" <preetika.ty...@intel.com> wrote: 
> > Hi All,
> > 
> > I am trying to setup and run Cassandra-dtests so that I can write some 
> > tests for a JIRA ticket I have been working on.
> > This is the repo I am using: 
> > https://github.com/apache/cassandra-dtest
> > I followed all the instru

RE: question on running cassandra-dtests

2018-03-26 Thread Tyagi, Preetika
I was able to run requirements.txt with success. Below is the error I get:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/_pytest/config.py", line 371, in 
_importconftest
mod = conftestpath.pyimport()
  File "/usr/local/lib/python2.7/dist-packages/py/_path/local.py", line 668, in 
pyimport
__import__(modname)
  File "/usr/local/lib/python2.7/dist-packages/_pytest/assertion/rewrite.py", 
line 213, in load_module
py.builtin.exec_(co, mod.__dict__)
  File "/usr/local/lib/python2.7/dist-packages/py/_builtin.py", line 221, in 
exec_
exec2(obj, globals, locals)
  File "", line 7, in exec2
  File "/home//conftest.py", line 11, in 
from itertools import zip_longest
ImportError: cannot import name zip_longest
ERROR: could not load /home//conftest.py

Thanks,
Preetika

-Original Message-
From: Murukesh Mohanan [mailto:murukesh.moha...@gmail.com] 
Sent: Sunday, March 25, 2018 10:48 PM
To: dev@cassandra.apache.org
Subject: Re: question on running cassandra-dtests

The complete error is needed. I get something similar if I hadn't run `pip3 
install -r requirements.txt`:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/_pytest/config.py", line 328, in 
_getconftestmodules
return self._path2confmods[path]
KeyError: local('/home/muru/dev/cassandra-dtest')

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/_pytest/config.py", line 359, in 
_importconftest
return self._conftestpath2mod[conftestpath]
KeyError: local('/home/muru/dev/cassandra-dtest/conftest.py')

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/_pytest/config.py", line 365, in 
_importconftest
mod = conftestpath.pyimport()
  File "/usr/local/lib/python3.6/site-packages/py/_path/local.py", line 668, in 
pyimport
__import__(modname)
  File "/usr/local/lib/python3.6/site-packages/_pytest/assertion/rewrite.py", 
line 212, in load_module
py.builtin.exec_(co, mod.__dict__)
  File "/home/muru/dev/cassandra-dtest/conftest.py", line 13, in 
from dtest import running_in_docker, 
cleanup_docker_environment_before_test_execution
  File "/home/muru/dev/cassandra-dtest/dtest.py", line 12, in 
import cassandra
ModuleNotFoundError: No module named 'cassandra'
ERROR: could not load /home/muru/dev/cassandra-dtest/conftest.py

Of course, `pip3 install -r requirements.txt` creates an `src` directory with 
appropriate branches of ccm and cassandra-driver checked out.

If you have run `pip3 install -r requirements.txt`, then something else is 
wrong and we need the complete error log.

On 2018/03/23 20:22:47, "Tyagi, Preetika" <preetika.ty...@intel.com> wrote: 
> Hi All,
> 
> I am trying to setup and run Cassandra-dtests so that I can write some tests 
> for a JIRA ticket I have been working on.
> This is the repo I am using: https://github.com/apache/cassandra-dtest
> I followed all the instructions and installed dependencies.
> 
> However, when I run "pytest -cassandra-dir= directory>
> 
> It throws the error "could not load /conftest.py.
> 
> I checked that this file (conftest.py) exists in Cassandra-dtest source root 
> and I'm not sure why it cannot find it. Does anyone have any idea what might 
> be going wrong here?
> 
> I haven't used dtests before so I wonder if I'm missing something here.
> 
> Thanks,
> Preetika
> 
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



question on running cassandra-dtests

2018-03-23 Thread Tyagi, Preetika
Hi All,

I am trying to setup and run Cassandra-dtests so that I can write some tests 
for a JIRA ticket I have been working on.
This is the repo I am using: https://github.com/apache/cassandra-dtest
I followed all the instructions and installed dependencies.

However, when I run "pytest -cassandra-dir=

It throws the error "could not load /conftest.py.

I checked that this file (conftest.py) exists in Cassandra-dtest source root 
and I'm not sure why it cannot find it. Does anyone have any idea what might be 
going wrong here?

I haven't used dtests before so I wonder if I'm missing something here.

Thanks,
Preetika



RE: Use of OpOrder in memtable

2018-02-13 Thread Tyagi, Preetika
Ah I see. That makes sense.
And it doesn't have to anything with the read requests going on in parallel 
with write requests, right?
I mean we read the data from memtable depending on whatever has been written 
into memtable so far and return it to the client (of course including SSTable 
read and timestamp comparison etc.)

-Original Message-
From: Benedict Elliott Smith [mailto:bened...@apache.org] 
Sent: Tuesday, February 13, 2018 2:25 PM
To: dev@cassandra.apache.org
Subject: Re: Use of OpOrder in memtable

If you look closely, there can be multiple memtables extant at once.  While all 
"new" writes are routed to the latest memtable, there may still be writes that 
have begun but not yet completed.  The memtable cannot be flushed until any 
stragglers have completed, and some stragglers *may* still need to be routed to 
their designated memtable (if they had only just begun when the flush 
triggered).  It helps avoid these race conditions on either side of the 
equation.

On 13 February 2018 at 22:09, Tyagi, Preetika <preetika.ty...@intel.com>
wrote:

> Hi all,
>
> I'm trying to understand the behavior of memtable when writes/flush 
> operations are going on in parallel.
>
> In my understanding, once a memtable is full it is queued for flushing 
> and a new memtable is created for ongoing write operations.
> However, I was looking at the code and it looks like the OpOrder class 
> is used (don't understand all details) to ensure the synchronization 
> between producers (writes) and consumers (batch flushes).
> So I am a bit confused about when exactly it is needed. There will 
> always be only one latest memtable for write operations and all old 
> memtables are flushed so where this producer/consumer interaction on 
> the same memtable is needed?
>
> Thanks,
> Preetika
>
>


Use of OpOrder in memtable

2018-02-13 Thread Tyagi, Preetika
Hi all,

I'm trying to understand the behavior of memtable when writes/flush operations 
are going on in parallel.

In my understanding, once a memtable is full it is queued for flushing and a 
new memtable is created for ongoing write operations.
However, I was looking at the code and it looks like the OpOrder class is used 
(don't understand all details) to ensure the synchronization between producers 
(writes) and consumers (batch flushes).
So I am a bit confused about when exactly it is needed. There will always be 
only one latest memtable for write operations and all old memtables are flushed 
so where this producer/consumer interaction on the same memtable is needed?

Thanks,
Preetika



RE: range queries on partition key supported?

2018-01-31 Thread Tyagi, Preetika
Thank you, Kurt. Just one more clarification.

And, then entire partition on each node will be searched based on the
> clustering key (i.e. "time" in this case).

No. it will skip to the section of the partition with time = '12:00'.
Cassandra should be smart enough to avoid reading the whole partition.

Yeah, that seems to correct. I probably didn't phrase it correctly.

Now let's assume a specific node is selected based on the token range and we 
need to look up for the data with time='12:00' within the partition which was 
obviously within token range.
Now on this node, there may be more than one partitions (let's take two 
partitions for example) which qualify for this token range. In that case, these 
two partitions will need to be looked up to get the data with the given time = 
12:00.
So I'm wondering how these two partitions will be looked up on this node. How 
the request query would look like on this node to get these partitions?
Does it make sense? Do you think I'm missing something?

Thanks,
Preetika

-Original Message-
From: kurt greaves [mailto:k...@instaclustr.com] 
Sent: Wednesday, January 31, 2018 9:46 PM
To: dev@cassandra.apache.org
Subject: Re: range queries on partition key supported?

>
> So that means more than one nodes can be selected to fulfill a range 
> query based on the token, correct?


Yes. When doing a token range query Cassandra will need to send requests to any 
node that owns part of the token range requested. This could be just one set of 
replicas or more, depending on how your token ring is arranged.
You could avoid querying multiple nodes by limiting the token() calls to be 
within one token range.

And, then entire partition on each node will be searched based on the
> clustering key (i.e. "time" in this case).

No. it will skip to the section of the partition with time = '12:00'.
Cassandra should be smart enough to avoid reading the whole partition.


On 31 January 2018 at 06:57, Tyagi, Preetika <preetika.ty...@intel.com>
wrote:

> So that means more than one nodes can be selected to fulfill a range 
> query based on the token, correct?
>
> I was looking at this link: https://www.datastax.com/dev/ 
> blog/a-deep-look-to-the-cql-where-clause
>
> In the example query,
> SELECT * FROM numberOfRequests
> WHERE token(cluster, date) > token('cluster1', '2015-06-03')
> AND token(cluster, date) <= token('cluster1', '2015-06-05')
> AND time = '12:00'
>
> More than one nodes might get picked for this token based range query.
> And, then entire partition on each node will be searched based on the 
> clustering key (i.e. "time" in this case).
> Is my understanding correct?
>
> Thanks,
> Preetika
>
> -Original Message-
> From: J. D. Jordan [mailto:jeremiah.jor...@gmail.com]
> Sent: Tuesday, January 30, 2018 10:13 AM
> To: dev@cassandra.apache.org
> Subject: Re: range queries on partition key supported?
>
> A range query can be performed on the token of a partition key, not on 
> the value.
>
> -Jeremiah
>
> > On Jan 30, 2018, at 12:21 PM, Tyagi, Preetika 
> > <preetika.ty...@intel.com>
> wrote:
> >
> > Hi All,
> >
> > I have a quick question on Cassandra's behavior in case of partition
> keys. I know that range queries are allowed in general, however, is it 
> also allowed on partition keys as well? The partition key is used as 
> an input to determine a node in a cluster, so I'm wondering how one 
> can possibly perform range query on that.
> >
> > Thanks,
> > Preetika
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


RE: create branch in my github account

2018-01-31 Thread Tyagi, Preetika
Thank you Michael. I was able to create the branch and push my changes! :)

Preetika

-Original Message-
From: Michael Shuler [mailto:mshu...@pbandjelly.org] On Behalf Of Michael Shuler
Sent: Tuesday, January 30, 2018 2:04 PM
To: dev@cassandra.apache.org
Subject: Re: create branch in my github account

On 01/30/2018 03:47 PM, Tyagi, Preetika wrote:
> Hi all,
> 
> I'm working on the JIRA ticket CASSANDRA-13981 and pushed a patch 
> yesterday, however, I have been suggested to create a branch in my 
> github account and then push all changes into that. The patch is too 
> big hence this seems to be a better approach. I haven't done it before 
> so wanted to ensure I do it correctly without messing things up :)
> 
> 
> 1.  On Cassandra GitHub: https://github.com/apache/cassandra,
> click on "Fork" to create my own copy in my account.
> 
> 2.  Git clone on the forked branch above

s/branch/repository/ - this is a new forked repo, not a branch

> 3.  Git checkout 

git checkout trunk
  # since 13981 appears to for 4.0 (trunk)
  # if you worked off some random sha, you may need to rebase on
  # trunk HEAD, otherwise it may not cleanly merge and that will be
  # the first patch review request.

git checkout -b CASSANDRA-13981
  # create a new branch

> 4.  Apply my patch
> 
> 5.  Git commit -m ""
> 
> 6.  Git push origin trunk

git push origin CASSANDRA-13981  # push a new branch to your fork

> Please let me know if you notice any issues. Thanks for your help!

You could do this in your fork on the trunk repository, but it's probably 
better to create a new branch, so you can fetch changes from the upstream trunk 
branch and rebase your branch, if that is needed. It is very common to have a 
number of remotes configured in your local
repository: one for your fork, one for the apache upstream, ones for other 
user's forks, etc. If you do your work directly in your trunk branch, you'll 
have conflicts when pulling in new commits from apache/cassandra trunk, for 
example.

--
Michael

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



RE: range queries on partition key supported?

2018-01-30 Thread Tyagi, Preetika
So that means more than one nodes can be selected to fulfill a range query 
based on the token, correct?

I was looking at this link: 
https://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause

In the example query,
SELECT * FROM numberOfRequests
WHERE token(cluster, date) > token('cluster1', '2015-06-03')
AND token(cluster, date) <= token('cluster1', '2015-06-05')
AND time = '12:00'

More than one nodes might get picked for this token based range query. And, 
then entire partition on each node will be searched based on the clustering key 
(i.e. "time" in this case).
Is my understanding correct?

Thanks,
Preetika

-Original Message-
From: J. D. Jordan [mailto:jeremiah.jor...@gmail.com] 
Sent: Tuesday, January 30, 2018 10:13 AM
To: dev@cassandra.apache.org
Subject: Re: range queries on partition key supported?

A range query can be performed on the token of a partition key, not on the 
value.

-Jeremiah

> On Jan 30, 2018, at 12:21 PM, Tyagi, Preetika <preetika.ty...@intel.com> 
> wrote:
> 
> Hi All,
> 
> I have a quick question on Cassandra's behavior in case of partition keys. I 
> know that range queries are allowed in general, however, is it also allowed 
> on partition keys as well? The partition key is used as an input to determine 
> a node in a cluster, so I'm wondering how one can possibly perform range 
> query on that.
> 
> Thanks,
> Preetika
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



create branch in my github account

2018-01-30 Thread Tyagi, Preetika
Hi all,

I'm working on the JIRA ticket CASSANDRA-13981 and pushed a patch yesterday, 
however, I have been suggested to create a branch in my github account and then 
push all changes into that. The patch is too big hence this seems to be a 
better approach.
I haven't done it before so wanted to ensure I do it correctly without messing 
things up :)


1.  On Cassandra GitHub: https://github.com/apache/cassandra, click on 
"Fork" to create my own copy in my account.

2.  Git clone on the forked branch above

3.  Git checkout 

4.  Apply my patch

5.  Git commit -m ""

6.  Git push origin trunk

Please let me know if you notice any issues. Thanks for your help!

Preetika






range queries on partition key supported?

2018-01-30 Thread Tyagi, Preetika
Hi All,

I have a quick question on Cassandra's behavior in case of partition keys. I 
know that range queries are allowed in general, however, is it also allowed on 
partition keys as well? The partition key is used as an input to determine a 
node in a cluster, so I'm wondering how one can possibly perform range query on 
that.

Thanks,
Preetika



RE: simple vs complex cells

2018-01-23 Thread Tyagi, Preetika
I agree. But this method is probably calculating the deletion time to determine 
the removal strategy for this complex column.
However, when someone tries to read this complex column and there are multiple 
copies of it in more than one sstables, how the read request will determine 
which complex data is the latest and should be returned to the user?

Preetika

-Original Message-
From: Minh Do [mailto:m...@netflix.com.INVALID] 
Sent: Tuesday, January 23, 2018 5:24 PM
To: dev@cassandra.apache.org
Subject: Re: simple vs complex cells

Based on C* 3.x code base, I believe a complex column consists of many cells 
and each cell has its own timestamp.

Then, there is a method to compute the maxTimestamp for a complex column:


public long maxTimestamp()
{
long timestamp = complexDeletion.markedForDeleteAt();
for (Cell cell : this)
timestamp = Math.max(timestamp, cell.timestamp());
return timestamp;
}


On Tue, Jan 23, 2018 at 4:22 PM, Tyagi, Preetika <preetika.ty...@intel.com>
wrote:

> Hi all,
>
> I'm trying to understand the behavior of simple and complex columns in 
> Cassandra.
> I was looking at UnfilteredSerializer.java, serializeRowBody() checks 
> for a timestamp flag and then only it writes it. In case of 
> writeComplexColumn(), there is no timestamp being written. Also, as 
> per my understanding, a complex column contains several simple columns 
> each of which may or may not have a timestamp associated.
>
> My question is if there is no mandatory timestamp for either simple or 
> complex columns, how the data will be merged at the time of  read 
> request based on the timestamp given that there can be more than one 
> copy of the same data in sstables?
>
> Also, is it allowed in cql queries to update one or more simple 
> columns within a complex columns? Or the entire complex is updated 
> whenever there is a update query?
>
> Thanks,
> Preetika
>
>


simple vs complex cells

2018-01-23 Thread Tyagi, Preetika
Hi all,

I'm trying to understand the behavior of simple and complex columns in 
Cassandra.
I was looking at UnfilteredSerializer.java, serializeRowBody() checks for a 
timestamp flag and then only it writes it. In case of writeComplexColumn(), 
there is no timestamp being written. Also, as per my understanding, a complex 
column contains several simple columns each of which may or may not have a 
timestamp associated.

My question is if there is no mandatory timestamp for either simple or complex 
columns, how the data will be merged at the time of  read request based on the 
timestamp given that there can be more than one copy of the same data in 
sstables?

Also, is it allowed in cql queries to update one or more simple columns within 
a complex columns? Or the entire complex is updated whenever there is a update 
query?

Thanks,
Preetika



code formatting

2018-01-18 Thread Tyagi, Preetika
Hi,

I have a quick question on the code formatting for Cassandra using IntelliJ.
I found a code formatter JAR here for Cassandra: 
https://wiki.apache.org/cassandra/CodeStyle?action=AttachFile=view=intellij-codestyle.jar

Does someone know how it can be imported to IntelliJ-Cassandra project settings 
so that the code can be formatted automatically (using Ctr + Shift + F command)?
Or is there a better way to do it?

Thanks,
Preetika



Question on submitting a patch

2018-01-05 Thread Tyagi, Preetika
Hi all,

When I click on "Submit Patch" option, it pops up a new screen where it asks 
for a bunch of details including Fix Version(s). Does the patch need to be 
synced up with the latest repo or I can just choose which version I worked with 
(which may not necessarily be the latest and hence one would be need to fetch 
that specific repo version in order to compile source with the patch)?

Also, there is no option to upload the patch file on this screen. Can someone 
point out where to actually upload the patch? I haven't done before so might be 
asking dumb questions! :)

Thanks,
Preetika




RE: How to fetch replication factor of a given keyspace

2017-12-21 Thread Tyagi, Preetika
Yeah, I tried doing it. The problem is when I call:

Schema.instance.getKeyspaceInstance(keyspaceName) or getKeyspaceMetadata();

It returns null for all keyspace names. Is it expected?

Thanks,
Preetika

-Original Message-
From: Nate McCall [mailto:zznat...@gmail.com] 
Sent: Wednesday, December 20, 2017 6:09 PM
To: dev <dev@cassandra.apache.org>
Subject: Re: How to fetch replication factor of a given keyspace

I think you want:
Schema.instance.getKeyspaceMetadata

There is a ReplicationParams nested under there which should have everything 
you need fully populated.

On Thu, Dec 21, 2017 at 2:02 PM, Tyagi, Preetika <preetika.ty...@intel.com> 
wrote:
> Hi,
>
> If I need to get the replication factor of a given keyspace in nodetool 
> commands (e.g. status), how can I do that? I'm trying to figure it out for a 
> JIRA item I'm working on.
>
> I tried using the below:
> Keyspace keyspace = Keyspace.open(keyspaceName); Int rf = 
> keyspace.getReplicationStrategy().getReplicationFactor()
>
> However, it runs into some issues since internally something doesn't get 
> initialized while looking up keyspaces/metatadata.
>
> Any ideas on how I can approach it differently?
>
> Thanks,
> Preetika
>

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



How to fetch replication factor of a given keyspace

2017-12-20 Thread Tyagi, Preetika
Hi,

If I need to get the replication factor of a given keyspace in nodetool 
commands (e.g. status), how can I do that? I'm trying to figure it out for a 
JIRA item I'm working on.

I tried using the below:
Keyspace keyspace = Keyspace.open(keyspaceName);
Int rf = keyspace.getReplicationStrategy().getReplicationFactor()

However, it runs into some issues since internally something doesn't get 
initialized while looking up keyspaces/metatadata.

Any ideas on how I can approach it differently?

Thanks,
Preetika



how to build nodetool source

2017-10-23 Thread Tyagi, Preetika
Hi all,

I might be missing something very simple here but it seems I cannot find a way 
to build tools/nodetool/* source files correctly in my dev set up.

For example, when I make a simple code change in line 105 
(System.out.println("Status=Up/Down");" in Status.java to print something else 
and run ant build, it doesn't get printed out when I run "nodetool status" 
command next time. It still prints the old message.

I'm using IntelliJ for development and building the code. I would appreciate 
any help!

Thanks,
Preetika



RE: Why Cassandra unit test run skips some of it?

2017-09-21 Thread Tyagi, Preetika
Yeah-I actually figured it out. CDC tests were ignored as Josh mentioned since 
it was disabled. I looked at the code and other skipped tests were also marked 
ignored. 
Thank you both for the response!

Preetika

-Original Message-
From: Jeff Jirsa [mailto:jji...@gmail.com] 
Sent: Thursday, September 21, 2017 11:03 AM
To: dev@cassandra.apache.org
Subject: Re: Why Cassandra unit test run skips some of it?

There’s also at least one test we skip in circleci where we know the container 
memory is insufficient for the test - based on environment variables in that 
case. 

-- 
Jeff Jirsa


> On Sep 21, 2017, at 10:14 AM, Josh McKenzie <jmcken...@apache.org> wrote:
> 
> It at least skips the CDC* tests unless you use the test-cdc target, as it
> needs some different .yaml configurations so runs as a separate job. Not
> sure about any other skips.
> 
> On Thu, Sep 21, 2017 at 12:29 PM, Tyagi, Preetika <preetika.ty...@intel.com>
> wrote:
> 
>> Hi all,
>> 
>> I downloaded and built the Cassandra project from GitHub and ran all unit
>> tests by running the below command:
>> 
>> ant test -Dtest.runners=4
>> 
>> When it finished, I saw >99% success rate, however, it also showed some
>> number under "Skipped" tests as well. Does someone know why would it skip
>> some tests?
>> 
>> Thanks,
>> Preetika
>> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Why Cassandra unit test run skips some of it?

2017-09-21 Thread Tyagi, Preetika
Hi all,

I downloaded and built the Cassandra project from GitHub and ran all unit tests 
by running the below command:

ant test -Dtest.runners=4

When it finished, I saw >99% success rate, however, it also showed some number 
under "Skipped" tests as well. Does someone know why would it skip some tests?

Thanks,
Preetika


reclaim after memtable flush

2017-09-20 Thread Tyagi, Preetika
Hi,

I'm trying to understand how Regions are being allocated and deallocated in 
Memtable. I can see that the same region is being used to allocate more size 
until the max region limit is hit.
However, once the max limit is reached, the current region is set to null and 
eventually new Region gets allocated.

My question is if it possible for this filled region (which can be set to null) 
to have some valid data (from current memtable) which hasn't been flushed yet? 
How each memtable is mapped to these regions?

Thanks,
Preetika



RE: question on assigning JIRA ticket

2017-09-19 Thread Tyagi, Preetika
Thank you, Jeff. It was really helpful!

Also, do I need to request an access in order to be able to assign an issue to 
myself? I cannot find an option to do that when I'm logged in.

Preetika

-Original Message-
From: Jeff Jirsa [mailto:jji...@gmail.com] 
Sent: Tuesday, September 19, 2017 9:50 AM
To: Cassandra DEV <dev@cassandra.apache.org>
Subject: Re: question on assigning JIRA ticket

If it's created by someone else but not assigned, you can assign it to yourself 
and begin work

If it's created by someone else and assigned to someone else, you should post 
on the ticket and ask if they mind if you work on it instead.
Sometimes people assign and then never work on it, and they won't mind if you 
take it. Sometimes they'll have started but hit a road block, and may be able 
to give you some code to start with. Just ask before you change the assignee.

- Jeff

On Tue, Sep 19, 2017 at 9:34 AM, Tyagi, Preetika <preetika.ty...@intel.com>
wrote:

> Hi all,
>
> I'm trying to figure out different ways in which one can contribute to 
> Cassandra. I know one can create a ticket and assign it to himself to 
> work on it. However, is it also allowed to assign and work on an 
> already existing ticket created by someone else?
>
> Thanks,
> Preetika
>
>


question on assigning JIRA ticket

2017-09-19 Thread Tyagi, Preetika
Hi all,

I'm trying to figure out different ways in which one can contribute to 
Cassandra. I know one can create a ticket and assign it to himself to work on 
it. However, is it also allowed to assign and work on an already existing 
ticket created by someone else?

Thanks,
Preetika



RE: question on the code formatter

2017-09-15 Thread Tyagi, Preetika
Thank you for the info! I tried it and it worked as expected.


-Original Message-
From: Murukesh Mohanan [mailto:murukesh.moha...@gmail.com] 
Sent: Thursday, September 14, 2017 11:10 PM
To: dev@cassandra.apache.org
Subject: Re: question on the code formatter

The wiki seems to be outdated. See
https://github.com/apache/cassandra/blob/trunk/doc/source/development/ide.rst
:


> The project generated by the ant task ``generate-idea-files`` contains
nearly everything
> you need to debug Cassandra and execute unit tests.
>
> * Run/debug defaults for JUnit
> * Run/debug configuration for Cassandra daemon
> * License header for Java source files
> * Cassandra code style
> * Inspections

You can just run `generate-idea-files` and then open the project in IDEA.
Code style settings should be automatically picked up by IDEA.

On Fri, 15 Sep 2017 at 14:46 Tyagi, Preetika <preetika.ty...@intel.com>
wrote:

> Hi all,
>
> I was trying to configure the Cassandra code formatter and downloaded 
> IntelliJ-codestyle.jar from this link:
> https://wiki.apache.org/cassandra/CodeStyle
>
> After extracting this JAR, I was able to import 
> codestyle/Default_1_.xml into my project and formatting seemed to work.
>
> However, I'm wondering what options/code.style.schemes.xml file is 
> exactly used for? Could anyone please give me an idea if I need to 
> configure this as well?
>
> Thanks,
> Preetika
>
>
> --

Murukesh Mohanan,
Yahoo! Japan


RE: Proposal: Closing old, unable-to-repro JIRAs

2017-09-15 Thread Tyagi, Preetika
+ 1 

This is a good idea.

-Original Message-
From: beggles...@apple.com [mailto:beggles...@apple.com] 
Sent: Friday, September 15, 2017 8:29 AM
To: dev@cassandra.apache.org
Subject: Re: Proposal: Closing old, unable-to-repro JIRAs

+1 to that


On September 14, 2017 at 4:50:54 PM, Jeff Jirsa (jji...@gmail.com) wrote:

There's a number of JIRAs that are old - sometimes very old - that represent 
bugs that either don't exist in modern versions, or don't have sufficient 
information for us to repro, but the reporter has gone away. 

Would anyone be offended if I start tagging these with the label 
'UnableToRepro' or 'Unresponsive' and start a 30 day timer to close them? 
Anyone have a better suggestion? 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



question on the code formatter

2017-09-14 Thread Tyagi, Preetika
Hi all,

I was trying to configure the Cassandra code formatter and downloaded 
IntelliJ-codestyle.jar from this link: 
https://wiki.apache.org/cassandra/CodeStyle

After extracting this JAR, I was able to import codestyle/Default_1_.xml into 
my project and formatting seemed to work.

However, I'm wondering what options/code.style.schemes.xml file is exactly used 
for? Could anyone please give me an idea if I need to configure this as well?

Thanks,
Preetika