Re: CQL: fails to COPY FROM with null values

2017-06-19 Thread Stefania Alborghetti
It doesn't work because of the white space. By default the NULL value is an
empty string, extra white spaces are not trimmed automatically.

This should work:

ce98d62a-3666-4d3a-ae2f-df315ad448aa,Jonsson,Malcom,,2001-01-19
17:55:17+

You can change the string representing missing values with the NULL option
if you cannot remove spaces from your data.

On Mon, Jun 19, 2017 at 10:10 PM, Tobias Eriksson <
tobias.eriks...@qvantel.com> wrote:

> Hi
>
>  I am trying to copy a file of CSV data into a table
>
> But I get an error since sometimes one of the columns (which is a UUID) is
> empty
>
> Is this a bug or what am I missing?
>
>
>
> Here is how it looks like
>
> Table
>
> id uuid,
>
> lastname text,
>
> firstname text,
>
> address_id uuid,
>
> dateofbirth timestamp,
>
>
>
> PRIMARY KEY (id, lastname, firstname)
>
>
>
> COPY playground.individual(id,lastname,firstname,address_id) FROM
> ‘testfile.csv’;
>
>
>
> Where the testfile.csv is like this
>
>
>
> This works !!!
>
> ce98d62a-3666-4d3a-ae2f-df315ad448aa,Jonsson,Malcom
> ,c9dc8b60-d27f-430c-b960-782d854df3a5,2001-01-19 17:55:17+
>
>
>
> This does NOT work !!!
>
> ce98d62a-3666-4d3a-ae2f-df315ad448aa,Jonsson,Malcom , ,2001-01-19
> 17:55:17+
>
>
>
> Cause then I get the following error
>
> *Failed to import 1 rows: ParseError - badly formed hexadecimal UUID
> string,  given up without retries*
>
>
>
> So, how do I import my CSV file and set the columns which does not have a
> UUID to null ?
>
>
>
> -Tobias
>



-- 

<http://www.datastax.com/>

STEFANIA ALBORGHETTI

Software engineer | +852 6114 9265 | stefania.alborghe...@datastax.com


[image: http://www.datastax.com/cloud-applications]
<http://www.datastax.com/cloud-applications>


Re: Copy from CSV on OS X problem with varint values <= -2^63

2017-04-05 Thread Stefania Alborghetti
It doesn't look like the embedded driver, it should come from a zip file
labeled with version *3.7.0.post0-2481531* for cassandra 3.10:

*Using CQL driver: *

Sorry, I should have posted this example in my previous email, rather than
an example based on the non-embedded driver.

I don't know who to contact regarding homebrew installation, but you could
download <http://cassandra.apache.org/download/> the Cassandra package,
unzip it, and run cqlsh and Cassandra from that directory?


On Thu, Apr 6, 2017 at 4:59 AM, Boris Babic <bo...@icloud.com> wrote:

> Stefania
>
> This is the output of my --debug, I never touched CQLSH_NO_BUNDLED and did
> not know about it.
> As you can see I have used homebrew to install Cassandra and looks like
> its the embedded version as it sits under the Cassandra folder ?
>
> cqlsh --debug
> Using CQL driver:  cassandra/3.10_1/libexec/vendor/lib/python2.7/site-
> packages/cassandra/__init__.pyc'>
> Using connect timeout: 5 seconds
> Using 'utf-8' encoding
> Using ssl: False
> Connected to Test Cluster at 127.0.0.1:9042.
> [cqlsh 5.0.1 | Cassandra 3.10 | CQL spec 3.4.4 | Native protocol v4]
> Use HELP for help.
>
>
> On Apr 5, 2017, at 12:07 PM, Stefania Alborghetti <stefania.alborghetti@
> datastax.com> wrote:
>
> You are welcome.
>
> I traced the problem to a commit of the Python driver that shipped in
> version 3.8 of the driver. It is fixed in 3.8.1. More details
> on CASSANDRA-13408. I don't think it's related to the OS.
>
> Since Cassandra 3.10 ships with an older version of the driver embedded in
> a zip file in the lib folder, and this version is not affected, I'm
> guessing that either the embedded version does not work on OS X, or you are
> manually using a different version of the driver by
> setting CQLSH_NO_BUNDLED (which is why I could reproduce it on my laptop).
>
> You can run cqlsh with --debug to see the version of the driver that cqlsh
> is using, for example:
>
> cqlsh --debug
> Using CQL driver:  dist-packages/cassandra_driver-3.8.1-py2.7-linux-x86_
> 64.egg/cassandra/__init__.pyc'>
>
> Can you confirm if you were overriding the Python driver by
> setting CQLSH_NO_BUNDLED and the version of the driver?
>
>
>
> On Tue, Apr 4, 2017 at 6:12 PM, Boris Babic <bo...@icloud.com> wrote:
> Thanks Stefania, going from memory don't think I noticed this on windows
> but haven't got a machine handy to test it on at the moment.
>
> On Apr 4, 2017, at 19:44, Stefania Alborghetti <stefania.alborghetti@
> datastax.com> wrote:
>
> I've reproduced the same problem on Linux, and I've
> opened CASSANDRA-13408. As a workaround, disable prepared statements and it
> will work (WITH HEADER = TRUE AND PREPAREDSTATEMENTS = False).
>
> On Tue, Apr 4, 2017 at 5:02 PM, Boris Babic <bo...@icloud.com> wrote:
>
> On Apr 4, 2017, at 7:00 PM, Boris Babic <bo...@icloud.com> wrote:
>
> Hi
>
> I’m testing the write of various datatypes on OS X for fun running
> cassandra 3.10 on a single laptop instance, and from what i can see
> varint should map to java.math.BigInteger and have no problems with
> Long.MIN_VALE , -9223372036854775808, but i can’t see what I’m doing wrong.
>
> cqlsh: 5.0.1
> cassandra 3.10
> osx el capitan.
>
> data.csv:
>
> id,varint
> -2147483648 <(214)%20748-3648>,-9223372036854775808
> 2147483647 <(214)%20748-3647>,9223372036854775807
>
> COPY mykeyspace.data (id,varint) FROM 'data.csv' WITH HEADER=true;
>
>   Failed to make batch statement: Received an argument of invalid type
> for column "varint". Expected: ,
> Got: ; (descriptor 'bit_length' requires a 'int' object but
> received a 'long’)
>
> If I directly type a similar insert in cqlsh no such problem occurs, in
> fact I can make the value many orders of magnitude less and all is fine.
>
> cqlsh> insert into mykeyspace.data (id,varint) 
> values(1,-9223372036854775808898989898)
> ;
>
> Had not observed this before on other OS, is this something todo with the
> way the copy from parser is interpreting varint for values <= -2^63 ?
>
> Thanks for any input
> Boris
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
> STEFANIA ALBORGHETTI
> Software engineer | +852 6114 9265 <+852%206114%209265> |
> stefania.alborghe...@datastax.com
>
> [image: http://www.datastax.com/cloud-applications]
>
>
>
>
>
> --
>
> STEFANIA ALBORGHETTI
> Software engineer | +852 6114 9265 <+852%206114%209265> |
> stefania.alborghe...@datastax.com
>
> [image: http://www.datastax.com/cloud-applications]
>
>
>
>


-- 

<http://www.datastax.com/>

STEFANIA ALBORGHETTI

Software engineer | +852 6114 9265 | stefania.alborghe...@datastax.com


[image: http://www.datastax.com/cloud-applications]
<http://www.datastax.com/cloud-applications>


Re: Copy from CSV on OS X problem with varint values <= -2^63

2017-04-04 Thread Stefania Alborghetti
You are welcome.

I traced the problem to a commit of the Python driver that shipped in
version 3.8 of the driver. It is fixed in 3.8.1. More details on
CASSANDRA-13408 <https://issues.apache.org/jira/browse/CASSANDRA-13408>. I
don't think it's related to the OS.

Since Cassandra 3.10 ships with an older version of the driver embedded in
a zip file in the lib folder, and this version is not affected, I'm
guessing that either the embedded version does not work on OS X, or you are
manually using a different version of the driver by setting CQLSH_NO_BUNDLED
(which is why I could reproduce it on my laptop).

You can run cqlsh with --debug to see the version of the driver that cqlsh
is using, for example:

*cqlsh --debug*
*Using CQL driver: *

Can you confirm if you were overriding the Python driver by setting
CQLSH_NO_BUNDLED and the version of the driver?



On Tue, Apr 4, 2017 at 6:12 PM, Boris Babic <bo...@icloud.com> wrote:

> Thanks Stefania, going from memory don't think I noticed this on windows
> but haven't got a machine handy to test it on at the moment.
>
> On Apr 4, 2017, at 19:44, Stefania Alborghetti <stefania.alborghetti@
> datastax.com> wrote:
>
> I've reproduced the same problem on Linux, and I've opened CASSANDRA-13408
> <https://issues.apache.org/jira/browse/CASSANDRA-13408>. As a workaround,
> disable prepared statements and it will work (WITH HEADER = TRUE AND
> PREPAREDSTATEMENTS = False).
>
> On Tue, Apr 4, 2017 at 5:02 PM, Boris Babic <bo...@icloud.com> wrote:
>
>>
>> On Apr 4, 2017, at 7:00 PM, Boris Babic <bo...@icloud.com> wrote:
>>
>> Hi
>>
>> I’m testing the write of various datatypes on OS X for fun running
>> cassandra 3.10 on a single laptop instance, and from what i can see varint
>> should map to java.math.BigInteger and have no problems with Long.MIN_VALE
>> , -9223372036854775808, but i can’t see what I’m doing wrong.
>>
>> cqlsh: 5.0.1
>> cassandra 3.10
>> osx el capitan.
>>
>> data.csv:
>>
>> id,varint
>> -2147483648 <(214)%20748-3648>,-9223372036854775808
>> 2147483647 <(214)%20748-3647>,9223372036854775807
>>
>> COPY mykeyspace.data (id,varint) FROM 'data.csv' WITH HEADER=true;
>>
>> *  Failed to make batch statement: Received an argument of invalid
>> type for column "varint". Expected: > 'cassandra.cqltypes.IntegerType'>, Got: ; (descriptor
>> 'bit_length' requires a 'int' object but received a 'long’)*
>>
>> If I directly type a similar insert in cqlsh no such problem occurs, in
>> fact I can make the value many orders of magnitude less and all is fine.
>>
>> cqlsh> insert into mykeyspace.data (id,varint)
>> values(1,-9223372036854775808898989898) ;
>>
>> Had not observed this before on other OS, is this something todo with the
>> way the copy from parser is interpreting varint for values <= -2^63 ?
>>
>> Thanks for any input
>> Boris
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
> --
>
> <http://www.datastax.com/>
>
> STEFANIA ALBORGHETTI
>
> Software engineer | +852 6114 9265 <+852%206114%209265> |
> stefania.alborghe...@datastax.com
>
>
> [image: http://www.datastax.com/cloud-applications]
> <http://www.datastax.com/cloud-applications>
>
>
>
>


-- 

<http://www.datastax.com/>

STEFANIA ALBORGHETTI

Software engineer | +852 6114 9265 | stefania.alborghe...@datastax.com


[image: http://www.datastax.com/cloud-applications]
<http://www.datastax.com/cloud-applications>


Re: Copy from CSV on OS X problem with varint values <= -2^63

2017-04-04 Thread Stefania Alborghetti
I've reproduced the same problem on Linux, and I've opened CASSANDRA-13408
<https://issues.apache.org/jira/browse/CASSANDRA-13408>. As a workaround,
disable prepared statements and it will work (WITH HEADER = TRUE AND
PREPAREDSTATEMENTS = False).

On Tue, Apr 4, 2017 at 5:02 PM, Boris Babic <bo...@icloud.com> wrote:

>
> On Apr 4, 2017, at 7:00 PM, Boris Babic <bo...@icloud.com> wrote:
>
> Hi
>
> I’m testing the write of various datatypes on OS X for fun running
> cassandra 3.10 on a single laptop instance, and from what i can see varint
> should map to java.math.BigInteger and have no problems with Long.MIN_VALE
> , -9223372036854775808, but i can’t see what I’m doing wrong.
>
> cqlsh: 5.0.1
> cassandra 3.10
> osx el capitan.
>
> data.csv:
>
> id,varint
> -2147483648 <(214)%20748-3648>,-9223372036854775808
> 2147483647 <(214)%20748-3647>,9223372036854775807
>
> COPY mykeyspace.data (id,varint) FROM 'data.csv' WITH HEADER=true;
>
> *  Failed to make batch statement: Received an argument of invalid
> type for column "varint". Expected:  'cassandra.cqltypes.IntegerType'>, Got: ; (descriptor
> 'bit_length' requires a 'int' object but received a 'long’)*
>
> If I directly type a similar insert in cqlsh no such problem occurs, in
> fact I can make the value many orders of magnitude less and all is fine.
>
> cqlsh> insert into mykeyspace.data (id,varint) 
> values(1,-9223372036854775808898989898)
> ;
>
> Had not observed this before on other OS, is this something todo with the
> way the copy from parser is interpreting varint for values <= -2^63 ?
>
> Thanks for any input
> Boris
>
>
>
>
>
>
>
>
>
>


-- 

<http://www.datastax.com/>

STEFANIA ALBORGHETTI

Software engineer | +852 6114 9265 | stefania.alborghe...@datastax.com


[image: http://www.datastax.com/cloud-applications]
<http://www.datastax.com/cloud-applications>


Re: COPY FROM performance

2017-03-14 Thread Stefania Alborghetti
Launch cqlsh withe the "--debug" option: cqlsh --debug. You should see
which Python driver it is using. My guess is that it is not using the
installed driver, which by default should be Cythonized, but it is still
using the embedded driver.

This is what is shown on my machine for the embedded driver:

Using CQL driver: 

And this is for an installed driver:

Using CQL driver: 

You should also notice that cqlsh takes a little bit longer to start when
using an external driver.

If you want to double check if the driver is Cythonized, any installed
driver should be, you can do "unzip -l" on the egg, for example:

unzip -l
/usr/local/lib/python2.7/dist-packages/cassandra_driver-3.7.1.post0-py2.7-linux-x86_64.egg

You should see files with extension ".pyc" and ".so", they indicate that
the driver was compiled with Cython.

Lastly, regarding point 3, Cython does not ship with C*. However, you don't
need it, unless you want to compile with Cython *pylib/cqlshlib/copyutil.py*
as well, but in my opinion, this is not worth it, the biggest improvement
is when the driver is compiled with Cython, at least from the tests I did.
Therefore, your steps above look correct.

It would be very odd that you see no performance gain with a Cythonized
driver, but like I said performance depends on the schema type, perhaps
your schema has complex types like collections, where the parsing is the
dominant factor - but in this case I would have expected cassandra-loader
to outperform COPY FROM. The parsing is done by *copyutil.py* by the way,
in which case you may well want to Cythonize it too (instructions in the
blog), but because the Python code in *copyutil.py* is not optimized for
Cython, don't expect huge gains.

If you want to move the burden of parsing from the client to the cluster,
you can do so with PREPAREDSTATEMENTS=False, but I only recommend this if
the cluster machines are idle.

Finally, make sure to try out some COPY FROM parameters, especially
NUMPROCESSES
and CHUNKSIZE. For the first parameter, observe the CPU and increase the
number of processes if you have idle CPU on the client, decrease it if the
CPU is blocked (dstat -lvrn 10). As for CHUNKSIZE, because it is in rows,
it may be that for your schema the ideal value is higher or smaller than
5,000, so try different values, such as 500 and 50,000, and see what this
does to performance.



On Tue, Mar 14, 2017 at 9:23 PM, Artur R <ar...@gpnxgroup.com> wrote:

> HI!
>
> I am trying to increase performance of COPY FROM by installing "*Cython
> <http://cython.org/> and libev
> <http://software.schmorp.de/pkg/libev.html> C extensions"* as
> described here: https://www.datastax.com/dev/blog/six-
> parameters-affecting-cqlsh-copy-from-performance.
>
> My steps are as the following:
>
>1. Install Cassandra 3.10, start it, add keyspace and table
>
>2. Install C Extensions:
>sudo apt-get install gcc python-dev
>
>3. Don't install Cython because as far as I understand it ships with
>C* 3 by default on step 1), so skip it
>
>4. Install libev:
>sudo apt-get install libev4 libev-dev
>
>5. Reinstall C* driver (because as far as I understand it shipped with
>C* on step 1):
>sudo pip install cassandra-driver
>
>6. export CQLSH_NO_BUNDLED=TRUE
>
>7. cqlsh to node and try COPY FROM
>
> And after all these steps above performance of COPY FROM is the same as
> before.
> I tested it with single node cluster and with multiple nodes cluster - it
> doesn't impact on performance.
> However, I see that COPY FROM is CPU bounded on my machines, so these
> steps should definitely increase the performance.
>
>
> The question: what I did wrong? Maybe some step is missed.
> How to check that COPY FROM really uses "*Cython
> <http://cython.org/> and libev
> <http://software.schmorp.de/pkg/libev.html> C extensions"* ?
>



-- 

<http://www.datastax.com/>

STEFANIA ALBORGHETTI

Software engineer | +852 6114 9265 | stefania.alborghe...@datastax.com


[image: http://www.datastax.com/cloud-applications]
<http://www.datastax.com/cloud-applications>


Re: HELP with bulk loading

2017-03-09 Thread Stefania Alborghetti
When I tested cqlsh COPY FROM for CASSANDRA-11053
<https://issues.apache.org/jira/browse/CASSANDRA-11053?focusedCommentId=15162800=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15162800>,
I was able to import about 20 GB in under 4 minutes on a cluster with 8
nodes using the same benchmark created for cassandra-loader, provided the
driver was Cythonized, instructions in this blog post
<http://www.datastax.com/dev/blog/six-parameters-affecting-cqlsh-copy-from-performance>.
The performance was similar to cassandra-loader.

Depending on your schema, one or the other may do slightly better.

On Fri, Mar 10, 2017 at 8:11 AM, Ryan Svihla <r...@foundev.pro> wrote:

> I suggest using cassandra loader
>
> https://github.com/brianmhess/cassandra-loader
>
> On Mar 9, 2017 5:30 PM, "Artur R" <ar...@gpnxgroup.com> wrote:
>
>> Hello all!
>>
>> There are ~500gb of CSV files and I am trying to find the way how to
>> upload them to C* table (new empty C* cluster of 3 nodes, replication
>> factor 2) within reasonable time (say, 10 hours using 3-4 instance of
>> c3.8xlarge EC2 nodes).
>>
>> My first impulse was to use CQLSSTableWriter, but it is too slow is of
>> single instance and I can't efficiently parallelize it (just creating Java
>> threads) because after some moment it always "hangs" (looks like GC is
>> overstressed) and eats all available memory.
>>
>> So the questions are:
>> 1. What is the best way to bulk-load huge amount of data to new C*
>> cluster?
>>
>> This comment here: https://issues.apache.org/jira/browse/CASSANDRA-9323:
>>
>> The preferred way to bulk load is now COPY; see CASSANDRA-11053
>>> <https://issues.apache.org/jira/browse/CASSANDRA-11053> and linked
>>> tickets
>>
>>
>> is confusing because I read that the CQLSSTableWriter + sstableloader is
>> much faster than COPY. Who is right?
>>
>> 2. Is there any real examples of multi-threaded using of CQLSSTableWriter?
>> Maybe ready to use libraries like: https://github.com/spotify/hdfs2cass?
>>
>> 3. sstableloader is slow too. Assuming that I have new empty C* cluster,
>> how can I improve the upload speed? Maybe disable replication or some other
>> settings while streaming and then turn it back?
>>
>> Thanks!
>> Artur.
>>
>


-- 

<http://www.datastax.com/>

STEFANIA ALBORGHETTI

Software engineer | +852 6114 9265 | stefania.alborghe...@datastax.com


[image: http://www.datastax.com/cloud-applications]
<http://www.datastax.com/cloud-applications>


Re: Python Upgrade to 2.7

2016-12-21 Thread Stefania Alborghetti
Python is missing the zlib module.

The solution to this problem depends on whether you've compiled Python from
source, or are using a distribution package.

Googling the error "can't decompress data; zlib not available" should
provide an answer on how to solve this. If not, send us more details on
your Python installation.

On Thu, Dec 22, 2016 at 6:51 AM, Jacob Shadix <jacobsha...@gmail.com> wrote:

> I am running Cassandra 2.1.14. Upgraded to Python 2.7 from 2.6.6 and
> getting the following error with CQLSH.
> ---
>
> Python Cassandra driver not installed, or not on PYTHONPATH.
>
> You might try "pip install cassandra-driver".
>
> Python: /opt/isv/python27/bin/python
>
> Error: can't decompress data; zlib not available
>
> ---
>
> What am I missing?
> -- Jacob Shadix
>



-- 


Stefania Alborghetti

|+852 6114 9265| stefania.alborghe...@datastax.com


Re: Problem with COPY FROM/TO and counters

2016-11-13 Thread Stefania Alborghetti
Thanks for reporting this, I've opened CASSANDRA-12909
<https://issues.apache.org/jira/browse/CASSANDRA-12909> with all the
details.

You can apply the patch linked in that ticket if you want a quick
workaround, but the root cause is still not fully understood.

The reason why only counters are affected is that when there are counters,
we do not use prepared statements, it seems the problem only occurs with
non prepared statements.

On Fri, Nov 11, 2016 at 4:22 PM, Jaroslav Kameník <jaros...@kamenik.cz>
wrote:

> Hi guys,
>
> we are making a simple tool which allows us to transform table
> via COPY TO -> drop table -> transform schema/data -> create table -> COPY
> FROM.
>
> It works well in most cases, but we have problem with loading of counter
> columns, it fails with "ParseError - argument for 's' must be a string,
>  given up without retries.".
> It works well if the same column is defined as int.
>
> Are we doing something wrong or encountered C* error?
>
>
> Thanks,
>
> Jaroslav
>
>
>
> [/tmp]$ echo 
> "EVT:be3bd2d0-a68d-11e6-90d4-1b2a65b8a28a,f7ce3ac0-a66e-11e6-b58e-4e29450fd577,SA,2"
> > data.csv
> [/tmp]$ cqlsh
> Connected to WOC at 127.0.0.1:9042.
> [cqlsh 5.0.1 | Cassandra 3.10-SNAPSHOT | CQL spec 3.4.3 | Native protocol
> v4]
> Use HELP for help.
> cqlsh> CREATE TABLE woc.table_test (
>... object_id ascii,
>... user_id timeuuid,
>... counter_id ascii,
>... count counter,
>... PRIMARY KEY ((object_id, user_id), counter_id)
>... );
> cqlsh>quit;
> [/tmp]$ cqlsh -e "copy woc.table_test(object_id, user_id, counter_id,
> count) from 'data.csv';"
> Using 7 child processes
>
> Starting copy of woc.table_test with columns [object_id, user_id,
> counter_id, count].
> :1:Failed to import 1 rows: ParseError - argument for 's' must be a
> string,  given up without retries
> :1:Failed to process 1 rows; failed rows written to
> import_woc_table_test.err
> Processed: 1 rows; Rate:   1 rows/s; Avg. rate:   2 rows/s
> 1 rows imported from 1 files in 0.560 seconds (0 skipped).
> [/tmp]$
> [/tmp]$
> [/tmp]$
> [/tmp]$
> [/tmp]$ cqlsh
> Connected to WOC at 127.0.0.1:9042.
> [cqlsh 5.0.1 | Cassandra 3.10-SNAPSHOT | CQL spec 3.4.3 | Native protocol
> v4]
> Use HELP for help.
> cqlsh> drop table woc.table_test;
> cqlsh>
> cqlsh> CREATE TABLE woc.table_test (
>... object_id ascii,
>... user_id timeuuid,
>... counter_id ascii,
>... count int,
>... PRIMARY KEY ((object_id, user_id), counter_id));
> cqlsh>quit;
> [/tmp]$ cqlsh -e "copy woc.table_test(object_id, user_id, counter_id,
> count) from 'data.csv';"
> Using 7 child processes
>
> Starting copy of woc.table_test with columns [object_id, user_id,
> counter_id, count].
> Processed: 1 rows; Rate:   1 rows/s; Avg. rate:   2 rows/s
> 1 rows imported from 1 files in 0.652 seconds (0 skipped).
> [/tmp]$
>
>
>
>
>
>  data
>
> echo 
> "EVT:be3bd2d0-a68d-11e6-90d4-1b2a65b8a28a,f7ce3ac0-a66e-11e6-b58e-4e29450fd577,SA,2"
> > data.csv
>
>
>  table definitions, first one is with counter column, second with
> int column
>
>
> CREATE TABLE woc.table_test (
> object_id ascii,
> user_id timeuuid,
> counter_id ascii,
> count counter,
> PRIMARY KEY ((object_id, user_id), counter_id)
> );
>
> DROP TABLE woc.table_test;
>
> CREATE TABLE woc.table_test (
> object_id ascii,
> user_id timeuuid,
> counter_id ascii,
> count int,
> PRIMARY KEY ((object_id, user_id), counter_id)
> );
>



-- 


Stefania Alborghetti

|+852 6114 9265| stefania.alborghe...@datastax.com


Re: Cannot set TTL in COPY command

2016-10-26 Thread Stefania Alborghetti
The TTL option for the COPY command was only added in 3.2, CASSANDRA-9494
<https://issues.apache.org/jira/browse/CASSANDRA-9494>.

On Thu, Oct 27, 2016 at 3:28 AM, Harikrishnan Pillai <
hpil...@walmartlabs.com> wrote:

> i have created a Jira for Cassandra version 3.9.Anyone have seen this
> scenario before in any 3.X version.
>
> https://issues.apache.org/jira/browse/CASSANDRA-12844
>
> Regards
>
> Hari
> --
> *From:* Lahiru Gamathige <lah...@highfive.com>
> *Sent:* Wednesday, October 26, 2016 10:46:51 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Cannot set TTL in COPY command
>
> Highly recommend to move to a newer Cassandra version first because TTL
> and compaction are much more consistent.
> On Wed, Oct 26, 2016 at 10:36 AM, Tyler Hobbs <ty...@datastax.com> wrote:
>
>>
>> On Wed, Oct 26, 2016 at 10:07 AM, techpyaasa . <techpya...@gmail.com>
>> wrote:
>>
>>> Can some one please tell me how to set TTL using COPY command?
>>
>>
>> It looks like you're using Cassandra 2.0.  I don't think COPY supports
>> the TTL option until at least 2.1.
>>
>>
>> --
>> Tyler Hobbs
>> DataStax <http://datastax.com/>
>>
>
>


-- 


Stefania Alborghetti

|+852 6114 9265| stefania.alborghe...@datastax.com


Re: CommitLogReadHandler$CommitLogReadException: Unexpected error deserializing mutation

2016-10-24 Thread Stefania Alborghetti
I'm sure you can share the schema and data privately with the ticket
assignee, when the ticket gets assigned and looked at.

If it was a schema change problem, you can try going back to the old schema
if you can recall what it was, but I cannot guarantee this would work
without knowing the root cause. Same thing regarding which release to try,
without knowing the root cause, it's really not possible to advise a
specific release.

The easiest thing to do is to skip the mutations with problems. You still
loose some data but at least not all data. If you see this in your logs:

Replay stopped. If you wish to override this error and continue starting
the node ignoring commit log replay problems, specify
-Dcassandra.commitlog.ignorereplayerrors=true on the command line.

Then it means that you can start Cassandra with
-Dcassandra.commitlog.ignorereplayerrors=true
and it will carry on even if it cannot parse some mutations, which will be
saved in the /tmp folder.

If it was a schema change problem, then you shouldn't need to start with
this property more than once. If the problem persists with new commit log
segments, then it's definitely another problem and you should really open a
ticket.

On Tue, Oct 25, 2016 at 10:36 AM, kurt Greaves <k...@instaclustr.com> wrote:

>
> On 25 October 2016 at 01:34, Ali Akhtar <ali.rac...@gmail.com> wrote:
>
>> I want some of the newer UDT features, like not needing to have frozen
>> UDTs
>
>
> You can try Instaclustr's 3.7 LTS release which is just 3.7 with some
> backported fixes from later versions. If you absolutely need those new
> features it's probably your best bet (until 4.0), however note that it's
> still 3.7 and likely less stable than the latest 3.0.x releases.
>
> https://github.com/instaclustr/cassandra
>
> Read the README at the repo for more info.
>
> Kurt Greaves
> k...@instaclustr.com
> www.instaclustr.com
>



-- 


Stefania Alborghetti

|+852 6114 9265| stefania.alborghe...@datastax.com


Re: CommitLogReadHandler$CommitLogReadException: Unexpected error deserializing mutation

2016-10-24 Thread Stefania Alborghetti
Did the schema change? This would be 12397.

If not, and if you don't mind sharing the data, or you have the steps to
reproduce it, could you please open a ticket so it can be looked at? You
need to attach the schema as well.

On Mon, Oct 24, 2016 at 9:33 PM, Ali Akhtar <ali.rac...@gmail.com> wrote:

> Its 'text'.  Don't know the answer of the 2nd question.
>
> On Mon, Oct 24, 2016 at 6:31 PM, Jonathan Haddad <j...@jonhaddad.com>
> wrote:
>
>> What type is board id? Is the value a tombstone?
>>
>> On Mon, Oct 24, 2016 at 1:38 AM Ali Akhtar <ali.rac...@gmail.com> wrote:
>>
>>> Thanks, but I did come across those, it doesn't look like they provide a
>>> resolution.
>>>
>>> On Mon, Oct 24, 2016 at 1:36 PM, DuyHai Doan <doanduy...@gmail.com>
>>> wrote:
>>>
>>> You may read those:
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-12121
>>> https://issues.apache.org/jira/browse/CASSANDRA-12397
>>>
>>> On Mon, Oct 24, 2016 at 10:24 AM, Ali Akhtar <ali.rac...@gmail.com>
>>> wrote:
>>>
>>> Any workarounds that don't involve me having to figure out how to
>>> uninstall and re-install a different version?
>>>
>>> On Mon, Oct 24, 2016 at 1:24 PM, Ali Akhtar <ali.rac...@gmail.com>
>>> wrote:
>>>
>>> 3.9..
>>>
>>> On Mon, Oct 24, 2016 at 1:22 PM, DuyHai Doan <doanduy...@gmail.com>
>>> wrote:
>>>
>>> Which version of C* ? There was similar issues with commitlogs in
>>> tic-toc versions.
>>>
>>> On Mon, Oct 24, 2016 at 4:18 AM, Ali Akhtar <ali.rac...@gmail.com>
>>> wrote:
>>>
>>> I have a single node cassandra installation on my dev laptop, which is
>>> used just for dev / testing.
>>>
>>> Recently, whenever I restart my laptop, Cassandra fails to start when I
>>> run it via 'sudo service cassandra start'.
>>>
>>> Doing a tail on /var/log/cassandra/system.log gives this log:
>>>
>>> *INFO  [main] 2016-10-24 07:08:02,950 CommitLog.java:166 - Replaying
>>> /var/lib/cassandra/commitlog/CommitLog-6-1476907676969.log,
>>> /var/lib/cassandra/commitlog/CommitLog-6-1476907676970.log,
>>> /var/lib/cassandra/commitlog/CommitLog-6-1477269052845.log*
>>> *ERROR [main] 2016-10-24 07:08:03,357 JVMStabilityInspector.java:82 -
>>> Exiting due to error while processing commit log during initialization.*
>>> *org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException:
>>> Unexpected error deserializing mutation; saved to
>>> /tmp/mutation9186356142128811141dat.  This may be caused by replaying a
>>> mutation against a table with the same name but incompatible schema.
>>> Exception follows: org.apache.cassandra.serializers.MarshalException: Not
>>> enough bytes to read 0th field board_id*
>>> * at
>>> org.apache.cassandra.db.commitlog.CommitLogReader.readMutation(CommitLogReader.java:410)
>>> [apache-cassandra-3.9.jar:3.9]*
>>> * at
>>> org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:343)
>>> [apache-cassandra-3.9.jar:3.9]*
>>> * at
>>> org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:202)
>>> [apache-cassandra-3.9.jar:3.9]*
>>> * at
>>> org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:85)
>>> [apache-cassandra-3.9.jar:3.9]*
>>> * at
>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:135)
>>> [apache-cassandra-3.9.jar:3.9]*
>>> * at
>>> org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:187)
>>> [apache-cassandra-3.9.jar:3.9]*
>>> * at
>>> org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:167)
>>> [apache-cassandra-3.9.jar:3.9]*
>>> * at
>>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:323)
>>> [apache-cassandra-3.9.jar:3.9]*
>>> * at
>>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601)
>>> [apache-cassandra-3.9.jar:3.9]*
>>> * at
>>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:730)
>>> [apache-cassandra-3.9.jar:3.9]*
>>>
>>>
>>> I then have to do 'sudo rm -rf /var/lib/cassandra/commitlog/*' which
>>> fixes the problem, but then I lose all of my data.
>>>
>>> It looks like its saying there wasn't enough data to read the field
>>> 'board_id', any ideas why that would be?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>


-- 


Stefania Alborghetti

|+852 6114 9265| stefania.alborghe...@datastax.com


Re: Hadoop vs Cassandra

2016-10-24 Thread Stefania Alborghetti
If you intend to use files on HDFS, I would recommend using Parquet files.
It's a very fast columnar format that allows querying data very
efficiently. I believe a Spark data frame will take care of saving all the
columns in a Parquet file. So you could extract the data from Cassandra via
the Spark connector and save it to Parquet.

Or you can query Cassandra data directly from Spark, but it won't be as
fast as Parquet.

It's a trade-off between how much data to save to Parquet, how often, how
many queries, what format and whether you can tolerate some stale data.


On Sun, Oct 23, 2016 at 7:18 PM, Welly Tambunan <if05...@gmail.com> wrote:

> Another thing is,
>
> Let's say that we already have a structure data, the way we load that to
> HDFS is to turn that one into a files ?
>
> Cheers
>
> On Sun, Oct 23, 2016 at 6:18 PM, Welly Tambunan <if05...@gmail.com> wrote:
>
>> So basically you will store that files to HDFS and use Spark to process
>> it ?
>>
>> On Sun, Oct 23, 2016 at 6:03 PM, Joaquin Alzola <
>> joaquin.alz...@lebara.com> wrote:
>>
>>>
>>>
>>> I think what Ali mentions is correct:
>>>
>>> If you need a lot of queries that require joins, or complex analytics of
>>> the kind that Cassandra isn't suited for, then HDFS / HBase may be better.
>>>
>>>
>>>
>>> We have files in which one line contains 500 fields (separated by pipe)
>>> and each of this fields is particularly important.
>>>
>>> Cassandra will not manage that since you will need 500 indexes. HDFS is
>>> the proper way.
>>>
>>>
>>>
>>>
>>>
>>> *From:* Welly Tambunan [mailto:if05...@gmail.com]
>>> *Sent:* 23 October 2016 10:19
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: Hadoop vs Cassandra
>>>
>>>
>>>
>>> I like muti data centre resillience in cassandra.
>>>
>>> I think thats plus one for cassandra.
>>>
>>> Ali, complex analytics can be done in spark right?
>>>
>>> On 23 Oct 2016 4:08 p.m., "Ali Akhtar" <ali.rac...@gmail.com> wrote:
>>>
>>> >
>>>
>>> > I would say it depends on your use case.
>>> >
>>> > If you need a lot of queries that require joins, or complex analytics
>>> of the kind that Cassandra isn't suited for, then HDFS / HBase may be
>>> better.
>>> >
>>> > If you can work with the cassandra way of doing things (creating new
>>> tables for each query you'll need to do, duplicating data - doing extra
>>> writes for faster reads) , then Cassandra should work for you. It is easier
>>> to setup and do dev ops with, in my experience.
>>> >
>>> > On Sun, Oct 23, 2016 at 2:05 PM, Welly Tambunan <if05...@gmail.com>
>>> wrote:
>>>
>>> >>
>>>
>>> >> I mean. HDFS and HBase.
>>> >>
>>> >> On Sun, Oct 23, 2016 at 4:00 PM, Ali Akhtar <ali.rac...@gmail.com>
>>> wrote:
>>>
>>> >>>
>>>
>>> >>> By Hadoop do you mean HDFS?
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Sun, Oct 23, 2016 at 1:56 PM, Welly Tambunan <if05...@gmail.com>
>>> wrote:
>>>
>>> >>>>
>>>
>>> >>>> Hi All,
>>> >>>>
>>> >>>> I read the following comparison between hadoop and cassandra. Seems
>>> the conclusion that we use hadoop for data lake ( cold data ) and Cassandra
>>> for hot data (real time data).
>>> >>>>
>>> >>>> http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop
>>> <http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop>
>>> >>>>
>>> >>>> My question is, can we just use cassandra to rule them all ?
>>> >>>>
>>> >>>> What we are trying to achieve is to minimize the moving part on our
>>> system.
>>> >>>>
>>> >>>> Any response would be really appreciated.
>>> >>>>
>>> >>>>
>>> >>>> Cheers
>>> >>>>
>>> >>>> --
>>> >>>> Welly Tambunan
>>> >>>> Triplelands
>>> >>>>
>>> >>>> http://weltam.wordpress.com <http://weltam.wordpress.com>
>>> >>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>> >>>
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Welly Tambunan
>>> >> Triplelands
>>> >>
>>> >> http://weltam.wordpress.com <http://weltam.wordpress.com>
>>> >> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>> >
>>> >
>>> This email is confidential and may be subject to privilege. If you are
>>> not the intended recipient, please do not copy or disclose its content but
>>> contact the sender immediately upon receipt.
>>>
>>
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>
>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com <http://www.triplelands.com/blog/>
>



-- 


Stefania Alborghetti

|+852 6114 9265| stefania.alborghe...@datastax.com


Re: How to insert "Empty" timeuuid by Cql

2016-10-19 Thread Stefania Alborghetti
You're correct, cassandra 2.1 is still using protocol version 3. You need
at least version 2.2.

On Thu, Oct 20, 2016 at 11:18 AM, Lijun Huang <coder...@gmail.com> wrote:

> Thanks Stefania, we haven't tried before, and I think the version is not
> matched, we are still using,
> [cqlsh 4.1.1 | Cassandra 2.1.11 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
>
> On Thu, Oct 20, 2016 at 10:33 AM, Stefania Alborghetti <
> stefania.alborghe...@datastax.com> wrote:
>
>> Have you already tried using unset values?
>>
>> http://www.datastax.com/dev/blog/datastax-java-driver-3-0-0-
>> released#unset-values
>>
>> They are only available starting with protocol version 4 however.
>>
>> On Thu, Oct 20, 2016 at 10:19 AM, Lijun Huang <coder...@gmail.com> wrote:
>>
>>> Hi Vladimir,
>>>
>>> Indeed, that's a little weird, I think it is like a empty string: '' but
>>> is a timeuuid value. We have many such records that inserted by Astyanax
>>> API, when we select it in cqlsh, it is like as below, note the column4 is
>>> timeuuid, it is not null or some value, just "empty".
>>>
>>> key  | column1  | column2 | column3 | column4 | value
>>> --+++
>>> ++--
>>> test by thrift | accessState |  |  |
>>>| 0x5
>>>
>>> But when we use Cql, we couldn't set this empty value, it is null or
>>> explicit value, like below,
>>>
>>> key  | column1  | column2 | column3 | column4  | value
>>> --+---+-+---
>>> --+--+--
>>>  test by cql   | accessState |  | |  null
>>>   | 0x5
>>>
>>> key  | column1  | column2 | column3 |
>>> column4  | value
>>> ---+--+--+--
>>> --+-
>>> --+-
>>>  test by cql   | accessState |  | |
>>> 4a528300-95cb-11e6-8650-0242f5eaa8c3| 0x5
>>>
>>> I don't know whether you could understand now, if not I could provide
>>> some code related to Astyanax. Really appreciate your help.
>>>
>>>
>>> On Wed, Oct 19, 2016 at 9:53 PM, Vladimir Yudovin <vla...@winguzone.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> what does it exactly mean 'empty timeuuid'?  UUID takes 16 bytes for
>>>> storage, so it should be either null, or some value. Do you mean 'zero'
>>>> UUID?
>>>>
>>>> Best regards, Vladimir Yudovin,
>>>>
>>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>>> CassandraLaunch your cluster in minutes.*
>>>>
>>>>
>>>>  On Wed, 19 Oct 2016 09:16:29 -0400*coderhlj <coder...@gmail.com
>>>> <coder...@gmail.com>>* wrote 
>>>>
>>>> Hi all,
>>>>
>>>> We use Cassandra 2.1.11 in our product, and we update the Java Drive
>>>> from Astyanax(Thrift API) to DataStax Java Driver(Cql) recently, but we
>>>> encounter a difficult issue as following, please help us, thanks in 
>>>> advance.
>>>>
>>>> Previously we were using Astyanax API, and we can insert empty timeuuid
>>>> like below, but now we can only insert null timeuuid by cql command but not
>>>> empty one. Is there any cql function to insert an empty timeuuid like
>>>> by Astyanax?
>>>> And this cause a tough problem is that we can not delete the record by
>>>> specifying the primary key, like:
>>>> *delete from "Foo" where column1='test' and column2='accessState' and
>>>> column3='' and column4=(need fill empty uuid here) IF EXISTS;*
>>>>
>>>> key  | column1  | column2 | column3 | column4 | value
>>>> -+-+-+-
>>>> +-+--
>>>> test by thrift | accessState |  |  |
>>>>| 0x5
>>>>
>>>> key  | column1  | column2 | column3 | column4  | value
>>>> -+-+-+-+--+-
>>>> -
>>>>  test by 

Re: How to insert "Empty" timeuuid by Cql

2016-10-19 Thread Stefania Alborghetti
Have you already tried using unset values?

http://www.datastax.com/dev/blog/datastax-java-driver-3-0-0-released#unset-values

They are only available starting with protocol version 4 however.

On Thu, Oct 20, 2016 at 10:19 AM, Lijun Huang <coder...@gmail.com> wrote:

> Hi Vladimir,
>
> Indeed, that's a little weird, I think it is like a empty string: '' but
> is a timeuuid value. We have many such records that inserted by Astyanax
> API, when we select it in cqlsh, it is like as below, note the column4 is
> timeuuid, it is not null or some value, just "empty".
>
> key  | column1  | column2 | column3 | column4 | value
> --+++
> ++--
> test by thrift | accessState |  |  |  |
> 0x5
>
> But when we use Cql, we couldn't set this empty value, it is null or
> explicit value, like below,
>
> key  | column1  | column2 | column3 | column4  | value
> --+---+-+---
> --+--+--
>  test by cql   | accessState |  | |  null |
> 0x5
>
> key  | column1  | column2 | column3 |
> column4  | value
> ---+--+--+--
> --+-
> --+-
>  test by cql   | accessState |  | |
> 4a528300-95cb-11e6-8650-0242f5eaa8c3| 0x5
>
> I don't know whether you could understand now, if not I could provide some
> code related to Astyanax. Really appreciate your help.
>
>
> On Wed, Oct 19, 2016 at 9:53 PM, Vladimir Yudovin <vla...@winguzone.com>
> wrote:
>
>> Hi,
>>
>> what does it exactly mean 'empty timeuuid'?  UUID takes 16 bytes for
>> storage, so it should be either null, or some value. Do you mean 'zero'
>> UUID?
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>> CassandraLaunch your cluster in minutes.*
>>
>>
>>  On Wed, 19 Oct 2016 09:16:29 -0400*coderhlj <coder...@gmail.com
>> <coder...@gmail.com>>* wrote 
>>
>> Hi all,
>>
>> We use Cassandra 2.1.11 in our product, and we update the Java Drive from
>> Astyanax(Thrift API) to DataStax Java Driver(Cql) recently, but we
>> encounter a difficult issue as following, please help us, thanks in advance.
>>
>> Previously we were using Astyanax API, and we can insert empty timeuuid
>> like below, but now we can only insert null timeuuid by cql command but not
>> empty one. Is there any cql function to insert an empty timeuuid like by
>> Astyanax?
>> And this cause a tough problem is that we can not delete the record by
>> specifying the primary key, like:
>> *delete from "Foo" where column1='test' and column2='accessState' and
>> column3='' and column4=(need fill empty uuid here) IF EXISTS;*
>>
>> key  | column1  | column2 | column3 | column4 | value
>> -+-+-+-
>> +-+--
>> test by thrift | accessState |  |  |  |
>> 0x5
>>
>> key  | column1  | column2 | column3 | column4  | value
>> -+-+-+-+--+-
>> -
>>  test by cql   | accessState |  | |  null
>>   | 0x5
>>
>>
>> cqlsh:StorageOS> desc table "Foo";
>>
>> CREATE TABLE "Foo" (
>>   key text,
>>   column1 text,
>>   column2 text,
>>   column3 text,
>>   column4 timeuuid,
>>   value blob,
>>   PRIMARY KEY (key, column1, column2, column3, column4)
>> ) WITH COMPACT STORAGE AND
>>   bloom_filter_fp_chance=0.01 AND
>>   caching='{"keys":"ALL", "rows_per_partition":"NONE"}' AND
>>   comment='' AND
>>   dclocal_read_repair_chance=0.10 AND
>>   gc_grace_seconds=432000 AND
>>   read_repair_chance=0.00 AND
>>   default_time_to_live=0 AND
>>   speculative_retry='NONE' AND
>>   memtable_flush_period_in_ms=0 AND
>>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>>   compression={'sstable_compression': 'LZ4Compressor'};
>>
>> --
>> Thanks,
>> Lijun Huang
>>
>>
>>
>
>
> --
> Best regards,
> Lijun Huang
>



-- 


Stefania Alborghetti

|+852 6114 9265| stefania.alborghe...@datastax.com


Re: cqlsh problem

2016-10-13 Thread Stefania Alborghetti
e, I believe you are having network
>>>>>>>>>>>>> issue of some kind.
>>>>>>>>>>>>>
>>>>>>>>>>>>> MacBook-Pro:~ alain$ cqlsh --version
>>>>>>>>>>>>> cqlsh 5.0.1
>>>>>>>>>>>>> MacBook-Pro:~ alain$ echo 'DESCRIBE KEYSPACES;' | cqlsh
>>>>>>>>>>>>> --connect-timeout=5 --request-timeout=10
>>>>>>>>>>>>> system_traces  system
>>>>>>>>>>>>> MacBook-Pro:~ alain$
>>>>>>>>>>>>>
>>>>>>>>>>>>> It's been a few days, did you manage to fix it ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> C*heers,
>>>>>>>>>>>>> ---
>>>>>>>>>>>>> Alain Rodriguez - al...@thelastpickle.com
>>>>>>>>>>>>> France
>>>>>>>>>>>>>
>>>>>>>>>>>>> The Last Pickle - Apache Cassandra Consulting
>>>>>>>>>>>>> http://www.thelastpickle.com
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2016-03-21 9:59 GMT+01:00 joseph gao <gaojf.bok...@gmail.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> cqlsh version 5.0.1. nodetool tpstats looks good, log looks
>>>>>>>>>>>>>> good. And I used specified port 9042. And it immediately returns 
>>>>>>>>>>>>>> fail (less
>>>>>>>>>>>>>> than 3 seconds). By the way where should I use 
>>>>>>>>>>>>>> '--connect-timeout', cqlsh
>>>>>>>>>>>>>> seems don't have such parameters.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2016-03-18 17:29 GMT+08:00 Alain RODRIGUEZ <
>>>>>>>>>>>>>> arodr...@gmail.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is the node fully healthy or rejecting some requests ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What are the outputs for "grep -i "ERROR"
>>>>>>>>>>>>>>> /var/log/cassandra/system.log" and "nodetool tpstats"?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any error? Any pending / blocked or dropped messages?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also did you try using distinct ports (9160 for thrift, 9042
>>>>>>>>>>>>>>> for native) - out of curiosity, not sure this will help.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What is your version of cqlsh "cqlsh --version" ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> doesn't work most times. But some time it just work fine
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Do you fill like this is due to a timeout (query being too
>>>>>>>>>>>>>>> big, cluster being to busy)? Try setting this higher:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --connect-timeout=CONNECT_TIMEOUT
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Specify the connection timeout in
>>>>>>>>>>>>>>> seconds (default: 5 seconds).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   --request-timeout=REQUEST_TIMEOUT
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Specify the default request timeout
>>>>>>>>>>>>>>> in seconds (default: 10 seconds).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> C*heers,
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>> Alain Rodriguez - al...@thelastpickle.com
>>>>>>>>>>>>>>> France
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The Last Pickle - Apache Cassandra Consulting
>>>>>>>>>>>>>>> http://www.thelastpickle.com
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2016-03-18 4:49 GMT+01:00 joseph gao <gaojf.bok...@gmail.com
>>>>>>>>>>>>>>> >:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Of course yes.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2016-03-17 22:35 GMT+08:00 Vishwas Gupta <
>>>>>>>>>>>>>>>> vishwas.gu...@snapdeal.com>:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Have you started the Cassandra service?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> sh cassandra
>>>>>>>>>>>>>>>>> On 17-Mar-2016 7:59 pm, "Alain RODRIGUEZ" <
>>>>>>>>>>>>>>>>> arodr...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi, did you try with the address of the node rather than
>>>>>>>>>>>>>>>>>> 127.0.0.1
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Is the transport protocol used by cqlsh (not sure if it
>>>>>>>>>>>>>>>>>> is thrift or binary - native in 2.1)  active ? What is the 
>>>>>>>>>>>>>>>>>> "nodetool info"
>>>>>>>>>>>>>>>>>> output ?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> C*heers,
>>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>> Alain Rodriguez - al...@thelastpickle.com
>>>>>>>>>>>>>>>>>> France
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The Last Pickle - Apache Cassandra Consulting
>>>>>>>>>>>>>>>>>> http://www.thelastpickle.com
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2016-03-17 14:26 GMT+01:00 joseph gao <
>>>>>>>>>>>>>>>>>> gaojf.bok...@gmail.com>:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> hi, all
>>>>>>>>>>>>>>>>>>> cassandra version 2.1.7
>>>>>>>>>>>>>>>>>>> When I use cqlsh to connect cassandra, something is wrong
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Connection error: ( Unable to connect to any servers',
>>>>>>>>>>>>>>>>>>> {'127.0.0.1': OperationTimedOut('errors=None,
>>>>>>>>>>>>>>>>>>> last_host=None,)})
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> This happens lots of times, but sometime it works just
>>>>>>>>>>>>>>>>>>> fine. Anybody knows why?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Joseph Gao
>>>>>>>>>>>>>>>>>>> PhoneNum:15210513582
>>>>>>>>>>>>>>>>>>> QQ: 409343351
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Joseph Gao
>>>>>>>>>>>>>>>> PhoneNum:15210513582
>>>>>>>>>>>>>>>> QQ: 409343351
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Joseph Gao
>>>>>>>>>>>>>> PhoneNum:15210513582
>>>>>>>>>>>>>> QQ: 409343351
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> --
>>>>>>>>>>>> Joseph Gao
>>>>>>>>>>>> PhoneNum:15210513582
>>>>>>>>>>>> QQ: 409343351
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> --
>>>>>>>>>>> Joseph Gao
>>>>>>>>>>> PhoneNum:15210513582
>>>>>>>>>>> QQ: 409343351
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> --
>>>>>>>>> Joseph Gao
>>>>>>>>> PhoneNum:15210513582
>>>>>>>>> QQ: 409343351
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Kurt Greaves
>>>>>>>> k...@instaclustr.com
>>>>>>>> www.instaclustr.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> --
>>>>>>> Joseph Gao
>>>>>>> PhoneNum:15210513582
>>>>>>> QQ: 409343351
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> --
>>>>> Joseph Gao
>>>>> PhoneNum:15210513582
>>>>> QQ: 409343351
>>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> --
> Joseph Gao
> PhoneNum:15210513582
> QQ: 409343351
>



-- 


Stefania Alborghetti

|+852 6114 9265| stefania.alborghe...@datastax.com


Re: COPY TO export fails with

2016-05-10 Thread Stefania Alborghetti
;> Memtable switch count: 72
>>>> Local read count: 0
>>>> Local read latency: NaN ms
>>>> Local write count: 139878
>>>> Local write latency: 0.023 ms
>>>> Pending flushes: 0
>>>> Bloom filter false positives: 0
>>>> Bloom filter false ratio: 0.0
>>>> Bloom filter space used: 6224240
>>>> Bloom filter off heap memory used: 6223592
>>>> Index summary off heap memory used: 1098860
>>>> Compression metadata off heap memory used: 9077088
>>>> Compacted partition minimum bytes: 373
>>>> Compacted partition maximum bytes: 1358102
>>>> Compacted partition mean bytes: 16252
>>>> Average live cells per slice (last five minutes): 0.0
>>>> Maximum live cells per slice (last five minutes): 0.0
>>>> Average tombstones per slice (last five minutes): 0.0
>>>> Maximum tombstones per slice (last five minutes): 0.0
>>>>
>>>>
>>>> Some of the errors:
>>>>
>>>> /export.cql:9:Error for (269754647900342974, 272655475232221549): 
>>>> OperationTimedOut - errors={}, last_host=10.1.12.89 (will try again later 
>>>> attempt 1 of 5)
>>>> /export.cql:9:Error for (-3191598516608295217, -3188807168672208162): 
>>>> OperationTimedOut - errors={}, last_host=10.1.12.89 (will try again later 
>>>> attempt 1 of 5)
>>>> /export.cql:9:Error for (-3066009427947359685, -3058745599093267591): 
>>>> OperationTimedOut - errors={}, last_host=10.1.8.5 (will try again later 
>>>> attempt 1 of 5)
>>>> /export.cql:9:Error for (-1737068099173540127, -1716693115263588178): 
>>>> OperationTimedOut - errors={}, last_host=10.1.8.5 (will try again later 
>>>> attempt 1 of 5)
>>>> /export.cql:9:Error for (-655042025062419794, -627527938552757160): 
>>>> OperationTimedOut - errors={}, last_host=10.1.12.89 (will try again later 
>>>> attempt 1 of 5)
>>>> /export.cql:9:Error for (2441403877625910843, 2445504271098651532): 
>>>> OperationTimedOut - errors={}, last_host=10.1.12.89 (permanently given up 
>>>> after 1000 rows and 1 attempts)
>>>>
>>>>
>>>> …
>>>>
>>>>
>>>>
>>>> --
>>>> Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
>>>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
>>>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
>>>> 172.1702676
>>>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
>>>> www.more4fi.de
>>>>
>>>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
>>>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
>>>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen
>>>> Schütz
>>>>
>>>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält
>>>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht
>>>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,
>>>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und
>>>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder
>>>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser
>>>> E-Mail ist nicht gestattet
>>>>
>>>
>>>
>>>
>>> --
>>> Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
>>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
>>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
>>> 172.1702676
>>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
>>> www.more4fi.de
>>>
>>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
>>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
>>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen
>>> Schütz
>>>
>>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält
>>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht
>>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,
>>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und
>>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder
>>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser
>>> E-Mail ist nicht gestattet
>>>
>>
>>
>> --
>>
>>
>>
>>
>
>
> --
> Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
> 172.1702676
> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
> www.more4fi.de
>
> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen Schütz
>
> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche
> und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige
> Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie
> bitte sofort den Absender und löschen Sie diese E-Mail und evtl.
> beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen
> evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser E-Mail ist
> nicht gestattet
>



-- 


[image: datastax_logo.png] <http://www.datastax.com/>

Stefania Alborghetti

Apache Cassandra Software Engineer

|+852 6114 9265| stefania.alborghe...@datastax.com


[image: cassandrasummit.org/Email_Signature]
<http://cassandrasummit.org/Email_Signature>


Re: Hi Memory consumption with Copy command

2016-04-23 Thread Stefania Alborghetti
That's really excellent! Thank you so much for sharing the results.

Regarding sstableloader, I am not familiar with its performance so I cannot
make any recommendation as I've never compared it with COPY FROM.

I have however compared COPY FROM with another bulk import tool,
cassandra-loader, <https://github.com/brianmhess/cassandra-loader/releases>
during the tests for CASSANDRA-11053. COPY FROM should now be as efficient
as this tool if not better (depending on data sets and test environment).

There is then this presentation
<http://www.slideshare.net/BrianHess4/bulk-loading-into-cassandra>, from
Cassandra Summit 2015, where it compares sstableloader, cassandra-loader
and the "old" COPY FROM. According to the results at slide #18,
sstableloader is slightly better than cassandra-loader for small records,
then the sstableloader performance decreases as the record size increases.

So my guess would be that sstableloader may or may not better, depending on
the record size. If it is better, I would think that the difference should
be minimal.  Sorry this is not very accurate but that's the best I have.


On Sat, Apr 23, 2016 at 6:00 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:

> I built cython and disabled bundled driver, the performance has been
> impressive. Memory issue is resolved and Im currently getting around
> 100,000 rows per second, its stressing both the client CPU as well as
> cassandra nodes. Thats the fastest I have ever seen it perform. With 60
> Million rows already transferred in ~5 Minutes.
>
> Just a final question before we close this thread, at this performance
> level would you recommend sstable loader or copy command?
>
> On Sat, Apr 23, 2016 at 2:00 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> Thanks Stefania for the informative answer.  The next blog was pretty
>> useful as well:
>> http://www.datastax.com/dev/blog/how-we-optimized-cassandra-cqlsh-copy-from
>> . Ill upgrade to 3.0.5 and test with C extensions enabled and report on
>> this thread.
>>
>> On Sat, Apr 23, 2016 at 8:54 AM, Stefania Alborghetti <
>> stefania.alborghe...@datastax.com> wrote:
>>
>>> Hi Bhuvan
>>>
>>> Support for large datasets in COPY FROM was added by CASSANDRA-11053
>>> <https://issues.apache.org/jira/browse/CASSANDRA-11053>, which is
>>> available in 2.1.14, 2.2.6, 3.0.5 and 3.5. Your scenario is valid with this
>>> patch applied.
>>>
>>> The 3.0.x and 3.x releases are already available, whilst the other two
>>> releases are due in the next few days. You only need to install an
>>> up-to-date release on the machine where COPY FROM is running.
>>>
>>> You may find the setup instructions in this blog
>>> <http://www.datastax.com/dev/blog/six-parameters-affecting-cqlsh-copy-from-performance>
>>> interesting. Specifically, for large datasets, I would highly recommend
>>> installing the Python driver with C extensions, as it will speed things up
>>> considerably. Again, this is only possible with the 11053 patch. Please
>>> ignore the suggestion to also compile the cqlsh copy module itself with C
>>> extensions (Cython), as you may hit CASSANDRA-11574
>>> <https://issues.apache.org/jira/browse/CASSANDRA-11574> in the 3.0.5
>>> and 3.5 releases.
>>>
>>> Before CASSANDRA-11053, the parent process was a bottleneck. This is
>>> explained further in  this blog
>>> <http://www.datastax.com/dev/blog/how-we-optimized-cassandra-cqlsh-copy-from>,
>>> second paragraph in the "worker processes" section. As a workaround, if you
>>> are unable to upgrade, you may try reducing the INGESTRATE and introducing
>>> a few extra worker processes via NUMPROCESSES. Also, the parent process is
>>> overloaded and is therefore not able to report progress correctly.
>>> Therefore, if the progress report is frozen, it doesn't mean the COPY
>>> OPERATION is not making progress.
>>>
>>> Do let us know if you still have problems, as this is new functionality.
>>>
>>> With best regards,
>>> Stefania
>>>
>>>
>>> On Sat, Apr 23, 2016 at 6:34 AM, Bhuvan Rawal <bhu1ra...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Im trying to copy a 20 GB CSV file into a 3 node fresh cassandra
>>>> cluster with 32 GB memory each, sufficient disk, RF-1 and durable write
>>>> false. The machine im feeding into is external to the cluster and shares
>>>> 1GBps line and has 16 GB RAM. (We have chosen this setup to possibly reduce
>>>

Re: Hi Memory consumption with Copy command

2016-04-22 Thread Stefania Alborghetti
Hi Bhuvan

Support for large datasets in COPY FROM was added by CASSANDRA-11053
<https://issues.apache.org/jira/browse/CASSANDRA-11053>, which is available
in 2.1.14, 2.2.6, 3.0.5 and 3.5. Your scenario is valid with this patch
applied.

The 3.0.x and 3.x releases are already available, whilst the other two
releases are due in the next few days. You only need to install an
up-to-date release on the machine where COPY FROM is running.

You may find the setup instructions in this blog
<http://www.datastax.com/dev/blog/six-parameters-affecting-cqlsh-copy-from-performance>
interesting. Specifically, for large datasets, I would highly recommend
installing the Python driver with C extensions, as it will speed things up
considerably. Again, this is only possible with the 11053 patch. Please
ignore the suggestion to also compile the cqlsh copy module itself with C
extensions (Cython), as you may hit CASSANDRA-11574
<https://issues.apache.org/jira/browse/CASSANDRA-11574> in the 3.0.5 and
3.5 releases.

Before CASSANDRA-11053, the parent process was a bottleneck. This is
explained further in  this blog
<http://www.datastax.com/dev/blog/how-we-optimized-cassandra-cqlsh-copy-from>,
second paragraph in the "worker processes" section. As a workaround, if you
are unable to upgrade, you may try reducing the INGESTRATE and introducing
a few extra worker processes via NUMPROCESSES. Also, the parent process is
overloaded and is therefore not able to report progress correctly.
Therefore, if the progress report is frozen, it doesn't mean the COPY
OPERATION is not making progress.

Do let us know if you still have problems, as this is new functionality.

With best regards,
Stefania


On Sat, Apr 23, 2016 at 6:34 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:

> Hi,
>
> Im trying to copy a 20 GB CSV file into a 3 node fresh cassandra cluster
> with 32 GB memory each, sufficient disk, RF-1 and durable write false. The
> machine im feeding into is external to the cluster and shares 1GBps line
> and has 16 GB RAM. (We have chosen this setup to possibly reduce CPU and IO
> usage).
>
> Im trying to use COPY command to feed in data. It kicks off well, launches
> a set of processes, does about 50,000 rows per second. But I can see that
> the parent process starts aggregating memory almost of the size of data
> processed and after a point the processes just hang. The parent process was
> consuming 95% system memory when it had processed around 60% data.
>
> I had earlier tried to feed in data from multiple files (Less than 4GB
> each) and it was working as expected.
>
> Is it a valid scenario?
>
> Regards,
> Bhuvan
>



-- 


[image: datastax_logo.png] <http://www.datastax.com/>

Stefania Alborghetti

Apache Cassandra Software Engineer

|+852 6114 9265| stefania.alborghe...@datastax.com


[image: cassandrasummit.org/Email_Signature]
<http://cassandrasummit.org/Email_Signature>


Re: What does FileCacheService's log message (invalidating cache) mean?

2016-03-20 Thread Stefania Alborghetti
>
> Does this mean RAR(s) must be created and added to the cache
> (FileCacheService) whenever a SSTable is opened even if in case of
> compaction? I think random access read doesn't need to read the data from a
> SSTable in case of compaction because the SSTable had sorted by their keys,
> then only need sequential read to merge.



RandomAccessReader is the name of the class (an implementation of
FileDataInput), but we use them for sequential scanning as well. We don't
necessarily put them in the cache but most of the times we do. On
compaction, if there is a rate limiter on a reader (compaction throughput >
0) then we don't store them in the cache. Disk access mode mmap (the
default on 64 bit machines) is also treated differently, as we cache the
memory mapped segments, not the readers, in 2.2.


On Thu, Mar 17, 2016 at 8:40 PM, Satoshi Hikida <sahik...@gmail.com> wrote:

> Hi, Stefania
>
> Thank you for your advice, again!!
>
> Could I ask you for another question?
>
> > Each sstable has one or more random access readers (one per segment for
> example) and FileCacheService is a cache for such readers
> Does this mean RAR(s) must be created and added to the cache
> (FileCacheService) whenever a SSTable is opened even if in case of
> compaction? I think random access read doesn't need to read the data from a
> SSTable in case of compaction because the SSTable had sorted by their keys,
> then only need sequential read to merge.
>
> If I have something wrong, I'm glad if you could correct.
>
>
> Regards,
> Satoshi
>
>
> On Thu, Mar 17, 2016 at 5:19 PM, Stefania Alborghetti <
> stefania.alborghe...@datastax.com> wrote:
>
>> Q1. Readers are created as needed, there is no fixed number. For example,
>> we may have 2 threads scanning sstables at the same time due to 2 different
>> CQL SELECT statements.
>>
>> Q2. There is no correlation between sstable size and JVM HEAP size. We
>> don't load entire sstables in memory.
>>
>> Q3. It's difficult to say what caused the invalidation messages,
>> basically anything that removed sstables from memory, such as dropping the
>> table, snapshots, compactions, streaming, there may me other operations I'm
>> not familiar with.
>>
>> Q4. Correct, these are temporary files. Once again, in 3.0 things are
>> different and the temporary files have been replaced by transaction logs
>> (CASSANDRA-7066).
>>
>>
>> On Thu, Mar 17, 2016 at 3:40 PM, Satoshi Hikida <sato...@imagine-orb.com>
>> wrote:
>>
>>> Sorry there is a mistake in my previous post. I would correct it.
>>>
>>> In Q3, I mentioned there are a lot of invalidating messages in the
>>> debug.log. It is true but cassandra configurations were wrong. In that
>>> case, the cassandra.yaml configurations are as follows:
>>>
>>> - cassandra.yaml
>>> - compaction_throughput_mb_per_sec: 0 (not 8 or default)
>>> - concurrent_compactors: 1
>>> - sstable_preemptive_open_interval_in_mb: 0  (not 8 or default)
>>> - memtable_flush_writers: 1
>>>
>>> And More precisely, in that case, Cassandra keep on outputting
>>> invalidating messages for a while(a few hours). However CPU usage is almost
>>> 0.0% in top command like below.
>>>
>>> $ top -bu cassandra -n 1
>>> ...
>>> PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
>>> COMMAND
>>> 2631 cassand+  20   0  0.250t 1.969g 703916 S   0.0 57.0   8459:35
>>> java
>>>
>>> I want to know what was actually happening at that time.
>>>
>>>
>>> Regards,
>>> Satoshi
>>>
>>>
>>> On Thu, Mar 17, 2016 at 3:56 PM, Satoshi Hikida <sahik...@gmail.com>
>>> wrote:
>>>
>>>> Thank you for your very useful advice!
>>>>
>>>>
>>>> Definitely, I'm using Cassandra V2.2.5 not 3.x. And basically I've
>>>> understood what does these logs mean. But I have more a few questions. So I
>>>> would very much appreciate If I get some explanations about these 
>>>> questions.
>>>>
>>>> * Q1.
>>>> In my understand, when open a SSTable, a lot of
>>>> RandomAccessReaders(RARs) are created. A number of RARs is equal to a
>>>> number of segments of SSTable. Is a number of segments(=RARs) equal to
>>>> follows?
>>>>
>>>> a number of segments = size of SSTable / size of segments
>>>>
>>>> * Q2.
>>>> What is happen if the Cassandra open a SSTable file which b

Re: What does FileCacheService's log message (invalidating cache) mean?

2016-03-19 Thread Stefania Alborghetti
Q1. Readers are created as needed, there is no fixed number. For example,
we may have 2 threads scanning sstables at the same time due to 2 different
CQL SELECT statements.

Q2. There is no correlation between sstable size and JVM HEAP size. We
don't load entire sstables in memory.

Q3. It's difficult to say what caused the invalidation messages, basically
anything that removed sstables from memory, such as dropping the table,
snapshots, compactions, streaming, there may me other operations I'm not
familiar with.

Q4. Correct, these are temporary files. Once again, in 3.0 things are
different and the temporary files have been replaced by transaction logs
(CASSANDRA-7066).


On Thu, Mar 17, 2016 at 3:40 PM, Satoshi Hikida <sato...@imagine-orb.com>
wrote:

> Sorry there is a mistake in my previous post. I would correct it.
>
> In Q3, I mentioned there are a lot of invalidating messages in the
> debug.log. It is true but cassandra configurations were wrong. In that
> case, the cassandra.yaml configurations are as follows:
>
> - cassandra.yaml
> - compaction_throughput_mb_per_sec: 0 (not 8 or default)
> - concurrent_compactors: 1
> - sstable_preemptive_open_interval_in_mb: 0  (not 8 or default)
> - memtable_flush_writers: 1
>
> And More precisely, in that case, Cassandra keep on outputting
> invalidating messages for a while(a few hours). However CPU usage is almost
> 0.0% in top command like below.
>
> $ top -bu cassandra -n 1
> ...
> PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
> COMMAND
> 2631 cassand+  20   0  0.250t 1.969g 703916 S   0.0 57.0   8459:35 java
>
> I want to know what was actually happening at that time.
>
>
> Regards,
> Satoshi
>
>
> On Thu, Mar 17, 2016 at 3:56 PM, Satoshi Hikida <sahik...@gmail.com>
> wrote:
>
>> Thank you for your very useful advice!
>>
>>
>> Definitely, I'm using Cassandra V2.2.5 not 3.x. And basically I've
>> understood what does these logs mean. But I have more a few questions. So I
>> would very much appreciate If I get some explanations about these questions.
>>
>> * Q1.
>> In my understand, when open a SSTable, a lot of RandomAccessReaders(RARs)
>> are created. A number of RARs is equal to a number of segments of SSTable.
>> Is a number of segments(=RARs) equal to follows?
>>
>> a number of segments = size of SSTable / size of segments
>>
>> * Q2.
>> What is happen if the Cassandra open a SSTable file which bigger than JVM
>> heap (or memory)?
>>
>> * Q3.
>> In my case, there are a lot of invalidating messages for the same SSTable
>> file (e.g. at least 11 records for tmplink-la-8348-big-Data.db in my
>> previous post). In some cases, there are more than 600 invalidating
>> messages for the same file and these messages logged for a few hours. Would
>> that closing a big SSTable is the cause?
>>
>> * Q4.
>> I saw "tmplink-xxx" or "tmp-xxx" files in the logs and also data
>> directories. Are these files temporary in compaction process?
>>
>>
>> Here is my experimental configurations.
>>
>> - Cassandra node: An aws EC2 instance(t2.medium. 4GBRAM, 2vCPU)
>> - Cassandra version: 2.2.5
>> - inserted data size: about 100GB
>> - cassandra-env.sh: default
>> - cassandra.yaml
>> - compaction_throughput_mb_per_sec: 8 (or default)
>> - concurrent_compactors: 1
>> - sstable_preemptive_open_interval_in_mb: 25 (or default)
>> - memtable_flush_writers: 1
>>
>>
>> Regards,
>> Satoshi
>>
>>
>> On Wed, Mar 16, 2016 at 5:47 PM, Stefania Alborghetti <
>> stefania.alborghe...@datastax.com> wrote:
>>
>>> Each sstable has one or more random access readers (one per segment for
>>> example) and FileCacheService is a cache for such readers. When an sstable
>>> is closed, the cache is invalidated. If no single reader of an sstable is
>>> used for at least 512 milliseconds, all readers are evicted. If the sstable
>>> is opened again, new reader(s) will be created and added to the cache again.
>>>
>>> FileCacheService was removed in cassandra 3.0 in favour of a pool of
>>> page-aligned buffers, and sharing the NIO file channels amongst the readers
>>> of an sstable, refer to CASSANDRA-8897
>>> <https://issues.apache.org/jira/browse/CASSANDRA-8897> and
>>> CASSANDRA-8893 <https://issues.apache.org/jira/browse/CASSANDRA-8893>
>>> for more details.
>>>
>>> On Wed, Mar 16, 2016 at 3:30 PM, satoshi hikida <sahik...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>&g

Re: What does FileCacheService's log message (invalidating cache) mean?

2016-03-16 Thread Stefania Alborghetti
Each sstable has one or more random access readers (one per segment for
example) and FileCacheService is a cache for such readers. When an sstable
is closed, the cache is invalidated. If no single reader of an sstable is
used for at least 512 milliseconds, all readers are evicted. If the sstable
is opened again, new reader(s) will be created and added to the cache again.

FileCacheService was removed in cassandra 3.0 in favour of a pool of
page-aligned buffers, and sharing the NIO file channels amongst the readers
of an sstable, refer to CASSANDRA-8897
<https://issues.apache.org/jira/browse/CASSANDRA-8897> and CASSANDRA-8893
<https://issues.apache.org/jira/browse/CASSANDRA-8893> for more details.

On Wed, Mar 16, 2016 at 3:30 PM, satoshi hikida <sahik...@gmail.com> wrote:

> Hi,
>
> I have been working on some experiments for Cassandra and found some log
> messages as follows in debug.log.
> I am not sure what it exactly is, so I would appreciate if someone gives
> me some explanations about it.
>
> In my verification, a Cassandra node runs as a stand-alone server on
> Amazon EC2 instance(t2.medium). And I insert 1 Billion records (about 100GB
> data size) to a table from a client application (which runs on another
> instance separated from Cassandra node). After insertion, Cassandra
> continues it's I/O activities for (probably) compaction and keep logging
> the messages as follows:
>
> ---
> ...
> DEBUG [NonPeriodicTasks:1] 2016-03-16 09:59:25,170
> FileCacheService.java:102 - Evicting cold readers for
> /var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/la-6-big-Data.db
> DEBUG [NonPeriodicTasks:1] 2016-03-16 09:59:31,780
> FileCacheService.java:177 - Invalidating cache for
> /var/lib/cassandra/data/test/user-3d988520e9e011e59d830f00df8833fa/tmplink-la-8348-big-Data.db
> DEBUG [NonPeriodicTasks:1] 2016-03-16 09:59:36,899
> FileCacheService.java:177 - Invalidating cache for
> /var/lib/cassandra/data/test/user-3d988520e9e011e59d830f00df8833fa/tmplink-la-8348-big-Data.db
> DEBUG [NonPeriodicTasks:1] 2016-03-16 09:59:42,187
> FileCacheService.java:177 - Invalidating cache for
> /var/lib/cassandra/data/test/user-3d988520e9e011e59d830f00df8833fa/tmplink-la-8348-big-Data.db
> DEBUG [NonPeriodicTasks:1] 2016-03-16 09:59:47,308
> FileCacheService.java:177 - Invalidating cache for
> /var/lib/cassandra/data/test/user-3d988520e9e011e59d830f00df8833fa/tmplink-la-8348-big-Data.db
> ...
> ---
>
> I guess these messages are related to the compaction process and
> FileCacheService was invalidating cache which associated with a SSTable
> file. But I'm not sure what it does actually mean. When the cache is
> invalidated? And What happens is after cache invalidation?
>
>
> Regards,
> Satoshi
>



-- 


[image: datastax_logo.png] <http://www.datastax.com/>

Stefania Alborghetti

Apache Cassandra Software Engineer

|+852 6114 9265| stefania.alborghe...@datastax.com


Re: Cassandra-stress output

2016-03-09 Thread Stefania Alborghetti
On Tue, Mar 8, 2016 at 8:39 PM, Jean Carlo <jean.jeancar...@gmail.com>
wrote:

> Hi guys,
>
> I use cassandra stress to populate the next table
>
> CREATE TABLE cf1 (
> kvalue text,
> ktype text,
> prov text,
> dname text,
> dattrib blob,
> dvalue text,
> PRIMARY KEY (kvalue, ktype, prov, dname)
>   ) WITH bloom_filter_fp_chance = 0.01
>  AND caching = '{"keys":"ALL", "rows_per_partition":"60"}'
> AND comment = ''
> AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
> AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.SnappyCompressor'}
> AND dclocal_read_repair_chance = 0.02
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.01
> AND speculative_retry = '99.0PERCENTILE';
>
> And cassandra stress create the next string to the field kvalue of type
> text:
>
> "P*d,xY\x03m\x1b\x10\x0b$\x04pt-G\x08\n`7\x1fs\x15kH\x02i1\x16jf%YM"
>
> what bothers me is that kvalue has control characters like \x03. do you
> guys know any way to avoid creating this kind of characters while using
> cassandra-stress?
>
>
>
> Thank you very much
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>


There is no way to avoid the control characters (<32 and ==127), other than
modifying the source code, which is located in
tools/stress/src/org/apache/cassandra/stress/generate/values/Strings.java.

Changing this line:

chars[i++] = (char) (((v & 127) + 32) & 127);

with this:

chars[i++] = (char) (((v & 127) % 95) + 32);

should work but I could not avoid the expensive modulo operation. You can
rebuild cassandra-stress with ant stress-build.

I wonder if the original intention was to avoid control characters however,
given the +32 in the original line. For this reason I've copied this
message to the dev mailing list.


-- 


[image: datastax_logo.png] <http://www.datastax.com/>

Stefania Alborghetti

Apache Cassandra Software Engineer

|+852 6114 9265| stefania.alborghe...@datastax.com


Re: Possible bug in Cassandra

2016-03-09 Thread Stefania Alborghetti
Thank you for reporting this.

I've filed https://issues.apache.org/jira/browse/CASSANDRA-11333.

On Thu, Mar 10, 2016 at 6:16 AM, Rakesh Kumar <dcrunch...@aim.com> wrote:

> Cassandra : 3.3
> CQLSH  : 5.0.1
>
> If there is a typo in the column name of the copy command, we get this:
>
> copy mytable
> (event_id,event_class_cd,event_ts,receive_ts,event_source_instance,client_id,client_id_type,event_tag,event_udf,client_event_date)
> from '/pathtofile.dat'
> with DELIMITER = '|' AND NULL = 'NULL' AND DATETIMEFORMAT='%Y-%m-%d
> %H:%M:%S.%f' ;
>
> Starting copy of mytable with columns ['event_id', 'event_class_cd',
> 'event_ts', 'receive_ts', 'event_source_instance', 'client_id',
> 'client_id_type', 'event_tag', 'event_udf', 'event_client_date'].
>
> load_ravitest.cql:5:13 child process(es) died unexpectedly, aborting
>
> the typo was in the name of event_client_date. It should have been
> client_event_date.
>
>


-- 


[image: datastax_logo.png] <http://www.datastax.com/>

Stefania Alborghetti

Apache Cassandra Software Engineer

|+852 6114 9265| stefania.alborghe...@datastax.com


Re: CASSANDRA-8072

2016-02-09 Thread Stefania Alborghetti
Can you make sure you changed the listen address on both hosts, using their
respective public IP address and that the seeds and cluster name are the
same on both hosts. The seeds should contain the public IP of the seed node.

Then verify you can telnet from one host to the other and vice-versa using
the internode port. This is normally 7000, the storage port in the yaml
file.

If you are still having problems,  you can log at TRACE level in
conf/logback.xml, this should log connection errors and gossip exchanges.




On Wed, Feb 10, 2016 at 2:28 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> On XX.YY :
>
> #  service iptables status
> iptables: Firewall is not running.
>
> I put public IP address for listen address on a non-seed node. I still
> got:
>
> INFO  18:14:17  OutboundTcpConnection using coalescing strategy DISABLED
> ERROR 18:14:48  Exception encountered during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
> at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1337)
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:541)
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:789)
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>
> Cheers
>
> On Mon, Feb 8, 2016 at 5:36 PM, Stefania Alborghetti <
> stefania.alborghe...@datastax.com> wrote:
>
>> Have you checked that you can telnet from one node to the other using the
>> same ip and the internode port?
>>
>> I would put the public IP addresses of the seeds in the seed list and set
>> the listen address to the public IP address for each node.
>>
>> There was a similar discussion
>> <https://mail-archives.apache.org/mod_mbox/cassandra-user/201602.mbox/%3CCAEQiCCVC3AygJAiPOVJJ4uG2wYQbgihEkbA_1BoBNUYc1uKaLw%40mail.gmail.com%3E>
>> recently that might help.
>>
>> On Tue, Feb 9, 2016 at 8:48 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> Thanks for the help, Stefania.
>>> By using "127.0.0.1" , I was able to start Cassandra on that seed node
>>> (XX.YY).
>>> However, on other nodes, I pointed seed to XX.YY and observed the
>>> following ?
>>> What did I miss ?
>>>
>>>
>>> INFO  [main] 2016-02-08 16:44:56,607  OutboundTcpConnection.java:97 -
>>> OutboundTcpConnection using coalescing strategy DISABLED
>>> ERROR [main] 2016-02-08 16:45:27,626  CassandraDaemon.java:581 -
>>> Exception encountered during startup
>>> java.lang.RuntimeException: Unable to gossip with any seeds
>>> at
>>> org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1337)
>>> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>>> at
>>> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:541)
>>> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>>> at
>>> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:789)
>>> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>>> at
>>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:721)
>>> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>>> at
>>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:612)
>>> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>>> at
>>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:389)
>>> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>>> at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:335)
>>> ~[dse-core-4.8.4.jar:4.8.4]
>>> at
>>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:564)
>>> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>>> at com.datastax.bdp.DseModule.main(DseModule.java:74)
>>> [dse-core-4.8.4.jar:4.8.4]
>>> INFO  [Thread-2] 2016-02-08 16:45:27,629  DseDaemon.java:418 - DSE
>>> shutting down...
>>>
>>> On Mon, Feb 8, 2016 at 4:25 PM, Stefania Alborghetti <
>>> stefania.alborghe...@datastax.com> wrote:
>>>
>>>> CASSANDRA-8072 is not going to help you because the code that fails
>>>> (checkForEndpointCollision()) should not execute for seeds.
>>>>
>>>> I think the problem is that there are no seeds in cassandra.yaml:
>>>>
>>>> - seeds: "XX.YY"
>>>>
>>>> If listen_address is localhost then try:
&g

Re: CASSANDRA-8072

2016-02-08 Thread Stefania Alborghetti
Have you checked that you can telnet from one node to the other using the
same ip and the internode port?

I would put the public IP addresses of the seeds in the seed list and set
the listen address to the public IP address for each node.

There was a similar discussion
<https://mail-archives.apache.org/mod_mbox/cassandra-user/201602.mbox/%3CCAEQiCCVC3AygJAiPOVJJ4uG2wYQbgihEkbA_1BoBNUYc1uKaLw%40mail.gmail.com%3E>
recently that might help.

On Tue, Feb 9, 2016 at 8:48 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> Thanks for the help, Stefania.
> By using "127.0.0.1" , I was able to start Cassandra on that seed node
> (XX.YY).
> However, on other nodes, I pointed seed to XX.YY and observed the
> following ?
> What did I miss ?
>
>
> INFO  [main] 2016-02-08 16:44:56,607  OutboundTcpConnection.java:97 -
> OutboundTcpConnection using coalescing strategy DISABLED
> ERROR [main] 2016-02-08 16:45:27,626  CassandraDaemon.java:581 - Exception
> encountered during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
> at
> org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1337)
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:541)
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:789)
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:721)
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:612)
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:389)
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:335)
> ~[dse-core-4.8.4.jar:4.8.4]
> at
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:564)
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at com.datastax.bdp.DseModule.main(DseModule.java:74)
> [dse-core-4.8.4.jar:4.8.4]
> INFO  [Thread-2] 2016-02-08 16:45:27,629  DseDaemon.java:418 - DSE
> shutting down...
>
> On Mon, Feb 8, 2016 at 4:25 PM, Stefania Alborghetti <
> stefania.alborghe...@datastax.com> wrote:
>
>> CASSANDRA-8072 is not going to help you because the code that fails
>> (checkForEndpointCollision()) should not execute for seeds.
>>
>> I think the problem is that there are no seeds in cassandra.yaml:
>>
>> - seeds: "XX.YY"
>>
>> If listen_address is localhost then try:
>>
>> - seeds: "127.0.0.1"
>>
>>
>> On Tue, Feb 9, 2016 at 5:58 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> If I apply the fix from CASSANDRA-8072 onto a 2.1.12 cluster, which
>>> files should I replace ?
>>>
>>> Thanks
>>>
>>> On Mon, Feb 8, 2016 at 1:07 PM, Bhuvan Rawal <bhu1ra...@gmail.com>
>>> wrote:
>>>
>>>> Your config looks fine to me,  i tried reproducing the scenario by
>>>> setting localhost in listen_address,rpc_address and seed list, and it
>>>> worked fine, I had earlier the node local ip in the 3 fields and it was
>>>> working fine.
>>>>
>>>> Looks like there is some other issue here.
>>>>
>>>> On Tue, Feb 9, 2016 at 12:49 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>
>>>>> Here it is:
>>>>> http://pastebin.com/QEdjtAj6
>>>>>
>>>>> XX.YY is localhost in this case.
>>>>>
>>>>> On Mon, Feb 8, 2016 at 11:03 AM, Bhuvan Rawal <bhu1ra...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> could you paste your cassandra.yaml here, except for commented out
>>>>>> lines?
>>>>>>
>>>>>> On Tue, Feb 9, 2016 at 12:30 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>>>
>>>>>>> The issue I described was observed on the seed node.
>>>>>>>
>>>>>>> Both rpc_address and listen_address point to localhost.
>>>>>>>
>>>>>>> bq. What addresses are there in the seed list?
>>>>>>>
>>>>>>> The IP of the seed node.
>>>>>>>
>>>>>>> I haven't come to starting non-seed node(s) yet.
>>>>>>>
>>&

Re: CASSANDRA-8072

2016-02-08 Thread Stefania Alborghetti
CASSANDRA-8072 is not going to help you because the code that fails
(checkForEndpointCollision()) should not execute for seeds.

I think the problem is that there are no seeds in cassandra.yaml:

- seeds: "XX.YY"

If listen_address is localhost then try:

- seeds: "127.0.0.1"


On Tue, Feb 9, 2016 at 5:58 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> If I apply the fix from CASSANDRA-8072 onto a 2.1.12 cluster, which files
> should I replace ?
>
> Thanks
>
> On Mon, Feb 8, 2016 at 1:07 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> Your config looks fine to me,  i tried reproducing the scenario by
>> setting localhost in listen_address,rpc_address and seed list, and it
>> worked fine, I had earlier the node local ip in the 3 fields and it was
>> working fine.
>>
>> Looks like there is some other issue here.
>>
>> On Tue, Feb 9, 2016 at 12:49 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> Here it is:
>>> http://pastebin.com/QEdjtAj6
>>>
>>> XX.YY is localhost in this case.
>>>
>>> On Mon, Feb 8, 2016 at 11:03 AM, Bhuvan Rawal <bhu1ra...@gmail.com>
>>> wrote:
>>>
>>>> could you paste your cassandra.yaml here, except for commented out
>>>> lines?
>>>>
>>>> On Tue, Feb 9, 2016 at 12:30 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>
>>>>> The issue I described was observed on the seed node.
>>>>>
>>>>> Both rpc_address and listen_address point to localhost.
>>>>>
>>>>> bq. What addresses are there in the seed list?
>>>>>
>>>>> The IP of the seed node.
>>>>>
>>>>> I haven't come to starting non-seed node(s) yet.
>>>>>
>>>>> Thanks for the quick response.
>>>>>
>>>>> On Mon, Feb 8, 2016 at 10:50 AM, Bhuvan Rawal <bhu1ra...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Ted,
>>>>>>
>>>>>> Have you specified the listen_address and rpc_address? What addresses
>>>>>> are there in the seed list?
>>>>>>
>>>>>> Have you started seed first and after waiting for 30 seconds started
>>>>>> other nodes?
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 9, 2016 at 12:14 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>> I am trying to setup a cluster with DSE 4.8.4
>>>>>>>
>>>>>>> I added the following in resources/cassandra/conf/cassandra.yaml :
>>>>>>>
>>>>>>> cluster_name: 'cass'
>>>>>>>
>>>>>>> which resulted in:
>>>>>>>
>>>>>>> http://pastebin.com/27adxKTM
>>>>>>>
>>>>>>> This seems to be resolved by CASSANDRA-8072
>>>>>>>
>>>>>>> My question is whether there is workaround ?
>>>>>>> If not, when can I expect 2.1.13 release ?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


-- 


[image: datastax_logo.png] <http://www.datastax.com/>

Stefania Alborghetti

Apache Cassandra Software Engineer

|+852 6114 9265| stefania.alborghe...@datastax.com


Re: SELECT JSON timestamp lacks timezone information

2016-02-08 Thread Stefania Alborghetti
It's cqlsh that converts timestamps to UTC and adds the timezone but for
JSON it can't do that because the conversion to JSON is done by Cassandra.

I've filed https://issues.apache.org/jira/browse/CASSANDRA-11137 to discuss
further.

On Mon, Feb 8, 2016 at 7:53 PM, Alexandre Dutra <
alexandre.du...@datastax.com> wrote:

> Sorry,
>
> I mistakenly thought that we were on the Java driver mailing list, my
> apologies. I also think you should definitely file a Jira ticket and ask
> JSON timestamps generated server-side to be 1) formatted with a format that
> mentions the timezone and 2) formatted preferably with UTC, not the JVM
> default timezone.
>
> Alexandre
>
> On Mon, Feb 8, 2016 at 12:23 PM Ralf Steppacher <ralf.viva...@gmail.com>
> wrote:
>
>> Hi Alexandre.
>>
>> I wrote to ‘user@cassandra.apache.org’.
>>
>> Re the actual problem: I am aware of the fact that C* does not store
>> (need not store) the timezone as it is persisted as a Unix epoche
>> timestamp. Not delivering a timezone in the JSON text representation would
>> be OKish if the text representation would be guaranteed to be in UTC. But
>> it is not. It is in some timezone determined by the locale of the server
>> side or that of the client VM. That way it is a pain in two ways as
>>
>> a) I have to add the timezone in a post-processing step to all timestamps
>> in my JSON responses and
>> b) I also have to do some guesswork at what the actual timezone might be
>>
>> If there is no way to control the formatting of JSON timestamps and to
>> add the time zone information, then IMHO that is bug. Is it not? Or am I
>> missing something here?
>>
>>
>> Thanks!
>> Ralf
>>
>>
>> On 08.02.2016, at 12:06, Alexandre Dutra <alexandre.du...@datastax.com>
>> wrote:
>>
>> Hello Ralf,
>>
>> First of all, Cassandra stores timestamps without timezone information,
>> so it's not possible to retrieve the original timezone used when inserting
>> the value.
>>
>> CQLSH uses the python driver behind the scenes, and my guess is that the
>> timestamp formatting is being done driver-side – hence the timezone – while
>> when you call toJson(), the formatting has to be done server-side.
>>
>> That said, it does seem that Cassandra is using a format without timezone
>> when converting timestamps to JSON format:
>>
>> https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/serializers/TimestampSerializer.java#L52
>>
>> I agree with you that a date format that would include the timezone would
>> be preferable here, but that is a question you should ask in the Cassandra
>> Users mailing list instead.
>>
>> Hope that helps,
>>
>> Alexandre
>>
>>
>>
>>
>> On Mon, Feb 8, 2016 at 11:09 AM Ralf Steppacher <ralf.viva...@gmail.com>
>> wrote:
>>
>>> Hello all,
>>>
>>> When I select a timestamp as JSON from Cassandra, the string
>>> representation lacks the timezone information, both via CQLSH and the Java
>>> Driver:
>>>
>>> cqlsh:events> select toJson(created_at) AS created_at from
>>> event_by_patient_timestamp ;
>>>
>>>  created_at
>>> ---
>>>  "2016-01-04 16:05:47.123"
>>>
>>> (1 rows)
>>>
>>> vs.
>>>
>>> cqlsh:events> select created_at FROM event_by_user_timestamp ;
>>>
>>>  created_at
>>> --
>>>  2016-01-04 15:05:47+
>>>
>>> (1 rows)
>>> cqlsh:events>
>>>
>>> To make things even more complicated the JSON timestamp is not returned
>>> in UTC. Is there a way to either tell the driver/C* to return the JSON date
>>> in UTC or add the timezone information (much preferred) to the text
>>> representation of the timestamp?
>>>
>>>
>>> Thanks!
>>> Ralf
>>
>> --
>> Alexandre Dutra
>> Driver & Tools Engineer @ DataStax
>>
>>
>> --
> Alexandre Dutra
> Driver & Tools Engineer @ DataStax
>



-- 


[image: datastax_logo.png] <http://www.datastax.com/>

Stefania Alborghetti

Apache Cassandra Software Engineer

|+852 6114 9265| stefania.alborghe...@datastax.com


Re: Can't select count(*)

2016-02-01 Thread Stefania Alborghetti
Regarding select count(*), the timeout is probably client side. Try
changing the default connect timeout in cqlsh via --request-timeout. By
default it is 10 seconds. Refer to "cqlsh --help" for more details but
basically "cqlsh --request-timeout=30" should work.

Regarding COPY TO/FROM, these commands were recently enhanced and should be
available in 2.2.5 (not yet released). More details in this blog post:
http://www.datastax.com/dev/blog/new-features-in-cqlsh-copy. The problem
with COPY FROM prior to this enhancement is that it only contacts one
replica, so it is subject to coordinator timeouts regardless of the size of
the cluster, and it does not retry on timeouts. It just aborts.

Release 2.2.5 should be available soon but if you cannot wait till then you
can try downloading the source code for 2.2 HEAD and running cqlsh from
there. It should be compatible.

EC2 t2.small is probably too weak but I don't know enough about AWS
instances to comment further about this.

On Mon, Feb 1, 2016 at 3:56 PM, Ivan Zelensky <bezbo...@gmail.com> wrote:

> Hi all! I have a table with simple primary key (one field on primary key
> only), and ~1 million records. Table stored on single-node C* 2.2.4.
> Problem: when I'm trying to execute "SELECT count(*) FROM my_table;",
> operation is timed out.
> As I understand, 1 mln rows is not so big dataset to use MapRed to count
> it, so I thing it is smth wrong with configuration.
> Also I can't do "COPY TO/FROM" when dataset > 30 000 rows.
> Also maybe it is too weak hadrware (AWS EC2 t2.small), but even on
> t2.large I had timeout on COPY, just on 300 000 rows.
>
> My configuration is default config from deb package. Maybe somebody know
> what I should to tweak there?
>
> Thank you.
>



-- 


[image: datastax_logo.png] <http://www.datastax.com/>

Stefania Alborghetti

Apache Cassandra Software Engineer

|+852 6114 9265| stefania.alborghe...@datastax.com