Re: Too many open files Cassandra 2.1.11.872

2015-11-06 Thread Jason Lewis
cat /proc/5980/limits
Limit Soft Limit   Hard Limit   Units
Max cpu time  unlimitedunlimitedseconds
Max file size unlimitedunlimitedbytes
Max data size unlimitedunlimitedbytes
Max stack size8388608  unlimitedbytes
Max core file size0unlimitedbytes
Max resident set  unlimitedunlimitedbytes
Max processes 2063522  2063522
 processes
Max open files10   10   files
Max locked memory unlimitedunlimitedbytes
Max address space unlimitedunlimitedbytes
Max file locksunlimitedunlimitedlocks
Max pending signals   2063522  2063522  signals
Max msgqueue size 819200   819200   bytes
Max nice priority 00
Max realtime priority 00
Max realtime timeout  unlimitedunlimitedus


On Fri, Nov 6, 2015 at 4:01 PM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> You probably need to configure ulimits correctly
> 
> .
>
> What does this give you?
>
> /proc//limits
>
>
> All the best,
>
>
> [image: datastax_logo.png] 
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png]  [image:
> facebook.png]  [image: twitter.png]
>  [image: g+.png]
> 
> 
> 
>
>
> 
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Fri, Nov 6, 2015 at 1:56 PM, Branton Davis 
> wrote:
>
>> We recently went down the rabbit hole of trying to understand the output
>> of lsof.  lsof -n has a lot of duplicates (files opened by multiple
>> threads).  Use 'lsof -p $PID' or 'lsof -u cassandra' instead.
>>
>> On Fri, Nov 6, 2015 at 12:49 PM, Bryan Cheng 
>> wrote:
>>
>>> Is your compaction progressing as expected? If not, this may cause an
>>> excessive number of tiny db files. Had a node refuse to start recently
>>> because of this, had to temporarily remove limits on that process.
>>>
>>> On Fri, Nov 6, 2015 at 10:09 AM, Jason Lewis 
>>> wrote:
>>>
 I'm getting too many open files errors and I'm wondering what the
 cause may be.

 lsof -n | grep java  show 1.4M files

 ~90k are inodes
 ~70k are pipes
 ~500k are cassandra services in /usr
 ~700K are the data files.

 What might be causing so many files to be open?

 jas

>>>
>>>
>>
>


Re: Re: Too many open files Cassandra 2.1.11.872

2015-11-06 Thread 郝加来
many connection ?





郝加来

From: Jason Lewis
Date: 2015-11-07 10:38
To: user@cassandra.apache.org
Subject: Re: Too many open files Cassandra 2.1.11.872
cat /proc/5980/limits
Limit Soft Limit   Hard Limit   Units
Max cpu time  unlimitedunlimitedseconds
Max file size unlimitedunlimitedbytes
Max data size unlimitedunlimitedbytes
Max stack size8388608  unlimitedbytes
Max core file size0unlimitedbytes
Max resident set  unlimitedunlimitedbytes
Max processes 2063522  2063522  processes
Max open files10   10   files
Max locked memory unlimitedunlimitedbytes
Max address space unlimitedunlimitedbytes
Max file locksunlimitedunlimitedlocks
Max pending signals   2063522  2063522  signals
Max msgqueue size 819200   819200   bytes
Max nice priority 00
Max realtime priority 00
Max realtime timeout  unlimitedunlimitedus




On Fri, Nov 6, 2015 at 4:01 PM, Sebastian Estevez 
 wrote:

You probably need to configure ulimits correctly.


What does this give you?


/proc//limits


All the best,



Sebastián Estévez
Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com







DataStax is the fastest, most scalable distributed database technology, 
delivering Apache Cassandra to the world’s most innovative enterprises. 
Datastax is built to be agile, always-on, and predictably scalable to any size. 
With more than 500 customers in 45 countries, DataStax is the database 
technology and transactional backbone of choice for the worlds most innovative 
companies such as Netflix, Adobe, Intuit, and eBay. 


On Fri, Nov 6, 2015 at 1:56 PM, Branton Davis  
wrote:

We recently went down the rabbit hole of trying to understand the output of 
lsof.  lsof -n has a lot of duplicates (files opened by multiple threads).  Use 
'lsof -p $PID' or 'lsof -u cassandra' instead.


On Fri, Nov 6, 2015 at 12:49 PM, Bryan Cheng  wrote:

Is your compaction progressing as expected? If not, this may cause an excessive 
number of tiny db files. Had a node refuse to start recently because of this, 
had to temporarily remove limits on that process.


On Fri, Nov 6, 2015 at 10:09 AM, Jason Lewis  wrote:

I'm getting too many open files errors and I'm wondering what the
cause may be.

lsof -n | grep java  show 1.4M files

~90k are inodes
~70k are pipes
~500k are cassandra services in /usr
~700K are the data files.

What might be causing so many files to be open?

jas


---
Confidentiality Notice: The information contained in this e-mail and any 
accompanying attachment(s)
is intended only for the use of the intended recipient and may be confidential 
and/or privileged of
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of 
this communication is
not the intended recipient, unauthorized use, forwarding, printing,  storing, 
disclosure or copying
is strictly prohibited, and may be unlawful.If you have received this 
communication in error,please
immediately notify the sender by return e-mail, and delete the original message 
and all copies from
your system. Thank you.
---


Do I have to use the cql in the datastax java driver?

2015-11-06 Thread Dikang Gu
Hi there,

In the datastax java driver, do I have to use the cql to talk to cassandra
cluster?

Can I still use thrift interface to talk to cassandra? Any reason that we
should not use thrift anymore?

Thanks.
-- 
Dikang


Insertion Delay Cassandra 2.1.9

2015-11-06 Thread Greg Traub
Cassandra users,

I have a 4 node Cassandra cluster set up.  All nodes are in a single rack
and distribution center.  I have a loader program which loads 40 million
rows into a table in a keyspace with a replication factor of 3.
Immediately after inserting the rows (after the loader program finishes),
if I SELECT count(*) from the table, the result is less than 40 million.
If I run our dumper program to retrieve all rows, it is less than 40
million.  However, if I wait roughly 20 minutes, the count eventually
reaches 40 million rows and the dumper program returns all 40 million.

If I do the same thing in a keyspace where the replication factor is 1, I
don't have any "stabilization" time and the 40 million rows are immediately
available.

I've modified the loading and dumping programs to use both the Thrift Java
driver and the CQL Java driver and neither seems to make a difference.

I'm very new to Cassandra and my questions are, what may be causing this
delay in all rows being available and how might I lessen/eliminate this
delay?

Thanks,
Greg


Re: Too many open files Cassandra 2.1.11.872

2015-11-06 Thread Branton Davis
We recently went down the rabbit hole of trying to understand the output of
lsof.  lsof -n has a lot of duplicates (files opened by multiple threads).
Use 'lsof -p $PID' or 'lsof -u cassandra' instead.

On Fri, Nov 6, 2015 at 12:49 PM, Bryan Cheng  wrote:

> Is your compaction progressing as expected? If not, this may cause an
> excessive number of tiny db files. Had a node refuse to start recently
> because of this, had to temporarily remove limits on that process.
>
> On Fri, Nov 6, 2015 at 10:09 AM, Jason Lewis 
> wrote:
>
>> I'm getting too many open files errors and I'm wondering what the
>> cause may be.
>>
>> lsof -n | grep java  show 1.4M files
>>
>> ~90k are inodes
>> ~70k are pipes
>> ~500k are cassandra services in /usr
>> ~700K are the data files.
>>
>> What might be causing so many files to be open?
>>
>> jas
>>
>
>


Re: Insertion Delay Cassandra 2.1.9

2015-11-06 Thread Greg Traub
Vidur,

Forgive me if I'm getting this wrong as I'm exceptionally new to Cassandra.

By consistency, if you mean the USING CONSISTENCY clause, then I'm not
specifying it which, per the CQL documentation, means a default of ONE.

On Fri, Nov 6, 2015 at 1:49 PM, Vidur Malik  wrote:

> What is your query consistency?
>
> On Fri, Nov 6, 2015 at 1:47 PM, Greg Traub 
> wrote:
>
>> Cassandra users,
>>
>> I have a 4 node Cassandra cluster set up.  All nodes are in a single rack
>> and distribution center.  I have a loader program which loads 40 million
>> rows into a table in a keyspace with a replication factor of 3.
>> Immediately after inserting the rows (after the loader program finishes),
>> if I SELECT count(*) from the table, the result is less than 40 million.
>> If I run our dumper program to retrieve all rows, it is less than 40
>> million.  However, if I wait roughly 20 minutes, the count eventually
>> reaches 40 million rows and the dumper program returns all 40 million.
>>
>> If I do the same thing in a keyspace where the replication factor is 1, I
>> don't have any "stabilization" time and the 40 million rows are immediately
>> available.
>>
>> I've modified the loading and dumping programs to use both the Thrift
>> Java driver and the CQL Java driver and neither seems to make a difference.
>>
>> I'm very new to Cassandra and my questions are, what may be causing this
>> delay in all rows being available and how might I lessen/eliminate this
>> delay?
>>
>> Thanks,
>> Greg
>>
>
>
>
> --
>
> Vidur Malik
>
> [image: ShopKeep] 
>
> 800.820.9814
> <8008209814> [image: ShopKeep]  [image:
> ShopKeep]  [image: ShopKeep]
> 
>


Re: Insertion Delay Cassandra 2.1.9

2015-11-06 Thread Vidur Malik
Ah, I thought you may have been using a higher consistency, which would
explain your error since the data may not have been replicated across all 3
nodes when you made the query.
Anyway, it seems to be happening because of replication. What version of
Cassandra are you using? There may be a issue filed in their JIRA.

On Fri, Nov 6, 2015 at 1:58 PM, Greg Traub  wrote:

> Vidur,
>
> Forgive me if I'm getting this wrong as I'm exceptionally new to Cassandra.
>
> By consistency, if you mean the USING CONSISTENCY clause, then I'm not
> specifying it which, per the CQL documentation, means a default of ONE.
>
> On Fri, Nov 6, 2015 at 1:49 PM, Vidur Malik  wrote:
>
>> What is your query consistency?
>>
>> On Fri, Nov 6, 2015 at 1:47 PM, Greg Traub 
>> wrote:
>>
>>> Cassandra users,
>>>
>>> I have a 4 node Cassandra cluster set up.  All nodes are in a single
>>> rack and distribution center.  I have a loader program which loads 40
>>> million rows into a table in a keyspace with a replication factor of 3.
>>> Immediately after inserting the rows (after the loader program finishes),
>>> if I SELECT count(*) from the table, the result is less than 40 million.
>>> If I run our dumper program to retrieve all rows, it is less than 40
>>> million.  However, if I wait roughly 20 minutes, the count eventually
>>> reaches 40 million rows and the dumper program returns all 40 million.
>>>
>>> If I do the same thing in a keyspace where the replication factor is 1,
>>> I don't have any "stabilization" time and the 40 million rows are
>>> immediately available.
>>>
>>> I've modified the loading and dumping programs to use both the Thrift
>>> Java driver and the CQL Java driver and neither seems to make a difference.
>>>
>>> I'm very new to Cassandra and my questions are, what may be causing this
>>> delay in all rows being available and how might I lessen/eliminate this
>>> delay?
>>>
>>> Thanks,
>>> Greg
>>>
>>
>>
>>
>> --
>>
>> Vidur Malik
>>
>> [image: ShopKeep] 
>>
>> 800.820.9814
>> <8008209814> [image: ShopKeep]  [image:
>> ShopKeep]  [image: ShopKeep]
>> 
>>
>
>


-- 

Vidur Malik

[image: ShopKeep] 

800.820.9814
<8008209814> [image: ShopKeep]  [image:
ShopKeep]  [image: ShopKeep]



Re: Too many open files Cassandra 2.1.11.872

2015-11-06 Thread Bryan Cheng
Is your compaction progressing as expected? If not, this may cause an
excessive number of tiny db files. Had a node refuse to start recently
because of this, had to temporarily remove limits on that process.

On Fri, Nov 6, 2015 at 10:09 AM, Jason Lewis  wrote:

> I'm getting too many open files errors and I'm wondering what the
> cause may be.
>
> lsof -n | grep java  show 1.4M files
>
> ~90k are inodes
> ~70k are pipes
> ~500k are cassandra services in /usr
> ~700K are the data files.
>
> What might be causing so many files to be open?
>
> jas
>


Re: Insertion Delay Cassandra 2.1.9

2015-11-06 Thread Vidur Malik
What is your query consistency?

On Fri, Nov 6, 2015 at 1:47 PM, Greg Traub  wrote:

> Cassandra users,
>
> I have a 4 node Cassandra cluster set up.  All nodes are in a single rack
> and distribution center.  I have a loader program which loads 40 million
> rows into a table in a keyspace with a replication factor of 3.
> Immediately after inserting the rows (after the loader program finishes),
> if I SELECT count(*) from the table, the result is less than 40 million.
> If I run our dumper program to retrieve all rows, it is less than 40
> million.  However, if I wait roughly 20 minutes, the count eventually
> reaches 40 million rows and the dumper program returns all 40 million.
>
> If I do the same thing in a keyspace where the replication factor is 1, I
> don't have any "stabilization" time and the 40 million rows are immediately
> available.
>
> I've modified the loading and dumping programs to use both the Thrift Java
> driver and the CQL Java driver and neither seems to make a difference.
>
> I'm very new to Cassandra and my questions are, what may be causing this
> delay in all rows being available and how might I lessen/eliminate this
> delay?
>
> Thanks,
> Greg
>



-- 

Vidur Malik

[image: ShopKeep] 

800.820.9814
<8008209814> [image: ShopKeep]  [image:
ShopKeep]  [image: ShopKeep]



Re: Insertion Delay Cassandra 2.1.9

2015-11-06 Thread Bryan Cheng
Your experience, then, is expected (although 20m delay seems excessive, and
is a sign you may be overloading your cluster, which may be expected with
an unthrottled bulk load like that).

When you insert with consistency ONE on RF > 1, that means your query
returns after one node confirms the write. The write will attempt to go out
to the other nodes that are responsible for that row, but the coordinator
does not bother waiting for the response. If your nodes are overloaded,
they may not accept the write at all; failures may result in hinted handoff
being used, or just the write being dropped in general.

At the end of your load, you likely have nodes missing writes. Look for
dropped MUTATION messages in your nodetool tpstats. For operations that
cannot tolerate this, you need to write and read with a higher consistency
level.

Consistency is achieved over time via hinted handoff, read repair, and
other mechanics (assuming you're not running a repair in between). Your
cluster will gradually return to consistency, *provided your nodes do not
suffer any downtime or exceed the hint window in terms of unavailability*.



On Fri, Nov 6, 2015 at 10:58 AM, Greg Traub  wrote:

> Vidur,
>
> Forgive me if I'm getting this wrong as I'm exceptionally new to Cassandra.
>
> By consistency, if you mean the USING CONSISTENCY clause, then I'm not
> specifying it which, per the CQL documentation, means a default of ONE.
>
> On Fri, Nov 6, 2015 at 1:49 PM, Vidur Malik  wrote:
>
>> What is your query consistency?
>>
>> On Fri, Nov 6, 2015 at 1:47 PM, Greg Traub 
>> wrote:
>>
>>> Cassandra users,
>>>
>>> I have a 4 node Cassandra cluster set up.  All nodes are in a single
>>> rack and distribution center.  I have a loader program which loads 40
>>> million rows into a table in a keyspace with a replication factor of 3.
>>> Immediately after inserting the rows (after the loader program finishes),
>>> if I SELECT count(*) from the table, the result is less than 40 million.
>>> If I run our dumper program to retrieve all rows, it is less than 40
>>> million.  However, if I wait roughly 20 minutes, the count eventually
>>> reaches 40 million rows and the dumper program returns all 40 million.
>>>
>>> If I do the same thing in a keyspace where the replication factor is 1,
>>> I don't have any "stabilization" time and the 40 million rows are
>>> immediately available.
>>>
>>> I've modified the loading and dumping programs to use both the Thrift
>>> Java driver and the CQL Java driver and neither seems to make a difference.
>>>
>>> I'm very new to Cassandra and my questions are, what may be causing this
>>> delay in all rows being available and how might I lessen/eliminate this
>>> delay?
>>>
>>> Thanks,
>>> Greg
>>>
>>
>>
>>
>> --
>>
>> Vidur Malik
>>
>> [image: ShopKeep] 
>>
>> 800.820.9814
>> <8008209814> [image: ShopKeep]  [image:
>> ShopKeep]  [image: ShopKeep]
>> 
>>
>
>


Re: store avro to cassandra

2015-11-06 Thread Jack Krupansky
Use a Cassandra map column - the keys of the map can be arbitrary. But if
there are some standard set of columns that are always or commonly present,
make them explicit Cassandra columns. And if the avro values have different
types, you may want to have several Cassandra map columns, one for each
value type you need to distinguish on queries. In some cases you might want
to use user defined types (UDT.)

-- Jack Krupansky

On Fri, Nov 6, 2015 at 1:38 AM, Lu Niu  wrote:

>
> Hi, cassandra users
>
> my data is in avro format and the schema is huge. Is there any way that I
> can automatically convert the avro schema to the schema that cassandra
> could use? also, the api that I could store and fetch the data? Thank you!
>
> Best,
> Lu
>
>


Too many open files Cassandra 2.1.11.872

2015-11-06 Thread Jason Lewis
I'm getting too many open files errors and I'm wondering what the
cause may be.

lsof -n | grep java  show 1.4M files

~90k are inodes
~70k are pipes
~500k are cassandra services in /usr
~700K are the data files.

What might be causing so many files to be open?

jas


Re: Too many open files Cassandra 2.1.11.872

2015-11-06 Thread Sebastian Estevez
You probably need to configure ulimits correctly

.

What does this give you?

/proc//limits


All the best,


[image: datastax_logo.png] 

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png]  [image:
facebook.png]  [image: twitter.png]
 [image: g+.png]







DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Fri, Nov 6, 2015 at 1:56 PM, Branton Davis 
wrote:

> We recently went down the rabbit hole of trying to understand the output
> of lsof.  lsof -n has a lot of duplicates (files opened by multiple
> threads).  Use 'lsof -p $PID' or 'lsof -u cassandra' instead.
>
> On Fri, Nov 6, 2015 at 12:49 PM, Bryan Cheng 
> wrote:
>
>> Is your compaction progressing as expected? If not, this may cause an
>> excessive number of tiny db files. Had a node refuse to start recently
>> because of this, had to temporarily remove limits on that process.
>>
>> On Fri, Nov 6, 2015 at 10:09 AM, Jason Lewis 
>> wrote:
>>
>>> I'm getting too many open files errors and I'm wondering what the
>>> cause may be.
>>>
>>> lsof -n | grep java  show 1.4M files
>>>
>>> ~90k are inodes
>>> ~70k are pipes
>>> ~500k are cassandra services in /usr
>>> ~700K are the data files.
>>>
>>> What might be causing so many files to be open?
>>>
>>> jas
>>>
>>
>>
>