Re: Too many open files

2018-01-22 Thread Nikolay Mihaylov
You can increase system open files,
also if you compact, open files will go down.

On Mon, Jan 22, 2018 at 10:19 AM, Dor Laor  wrote:

> It's a high number, your compaction may run behind and thus
> many small sstables exist. However, you're also taking the
> number of network connection in the calculation (everything
> in *nix is a file). If it makes you feel better my laptop
> has 40k open files for Chrome..
>
> On Sun, Jan 21, 2018 at 11:59 PM, Andreou, Arys (Nokia - GR/Athens) <
> arys.andr...@nokia.com> wrote:
>
>> Hi,
>>
>>
>>
>> I keep getting a “Last error: Too many open files” followed by a list of
>> node IPs.
>>
>> The output of “lsof -n|grep java|wc -l” is about 674970 on each node.
>>
>>
>>
>> What is a normal number of open files?
>>
>>
>>
>> Thank you.
>>
>>
>>
>
>


Re: Production with Single Node

2016-01-27 Thread Nikolay Mihaylov
HI

We have 2 - 3 installations with single node Cassandra. They working fine,
no problems there,
except if Cassandra stops, everything stops. Even on one node, we usually
"rolling" 500-600 GB data, sometimes even 2-3 TB. We use mostly standard
configuration with almost no changes there.

Here are some considerations for bloom filter config, but they are for old
Cassandra version:
http://nmmm.nu/bloomfilter.htm

https://whoisrequest.com/ - this uses single node Cassandra with about 600
GB data.

We found that it works much better and faster than MySQL. We did test
Postgres, but it was terribly slow. We were in big hurry so we did not
analyze why Postgres was so slow.

Another lesson we learned - when you do single node, put only Cassandra on
single server. Keep webserver / client on different server.

In our latest project we did use TokuDB. It is something like MySQL
"plugin". We know Toku from 5-6 years, but until recently it was paid
software with free demo. TokuDB is currently GPL.

Here is what we researched 5 years ago:

http://www.novini.net/2010/12/mysql-storage-engines-comparison.html

We also did test MongoDB. It is quite fast, but it have been eaten our HDD
very fast.

So little recap what we have:

- Cassandra single nodes - 600-700 GB data
- MySQL with MyISAM - 30-40 GB data
- TokuDB - 100 GB data (this equals to 500 GB MyISAM / InnoDB).

Feel free to contact me if you have non Cassandra related questions.


On Sat, Jan 23, 2016 at 7:10 AM, Anuj Wadehra 
wrote:

> And I think in a 3 node cluster, RAID 0 would do the job instead of RAID 5
> . So you will need less storage to get same disk space. But you will get
> protection against disk failures and infact entire node failure.
>
> Anuj
>
> Sent from Yahoo Mail on Android
> 
>
> On Sat, 23 Jan, 2016 at 10:30 am, Anuj Wadehra
>  wrote:
> I think Jonathan said it earlier. You may be happy with the performance
> for now as you are using the same commitlog settings that you use in large
> clusters. Test the new setting recommended so that you know the real
> picture. Or be prepared to lose some data in case of failure.
>
> Other than durability, you single node cluster would be Single Point of
> Failure for your site. RAID 5 will only protect you against a disk failure.
> But a server may be down for other reasons too. Question is :Are you ok
> with site going down?
>
> I would suggest you to use hardware with smaller configuration to save on
> cost for smaller sites and go ahead with a 3 node minimum.That ways you
> will provide all the good features of your design irrespective of the site.
> Cassandra is known to work on commodity servers too.
>
>
>
> Thanks
> Anuj
>
>
>
>
> Sent from Yahoo Mail on Android
> 
>
> On Sat, 23 Jan, 2016 at 4:23 am, Jack Krupansky
>  wrote:
> You do of course have the simple technical matters, most of which need to
> be addressed with a proof of concept implementation, related to memory,
> storage, latency, and throughput. I mean, with a scaled cluster you can
> always add nodes to increase capacity and throughput, and reduce latency,
> but with a single node you have limited flexibility.
>
> Just to be clear, Cassandra is still not recommended for "fat nodes" -
> even if you can fit tons of data on the node, you may not have the computes
> to satisfy throughput and latency requirements. And if you don't have
> enough system memory the amount of storage is irrelevant.
>
> Back to my original question:
> How much data (rows, columns), what kind of load pattern (heavy write,
> heavy update, heavy query), and what types of queries (primary key-only,
> slices, filtering, secondary indexes, etc.)?
>
> I do recall a customer who ran into problems because they had SSD but only
> a very limited amount so they were running out of storage. Having enough
> system memory for file system caching and offheap data is important as well.
>
>
> -- Jack Krupansky
>
> On Fri, Jan 22, 2016 at 5:07 PM, John Lammers <
> john.lamm...@karoshealth.com> wrote:
>
>> Thanks for your response Jack.
>>
>> We are already sold on distributed databases, HA and scaling.  We just
>> have some small deployments coming up where there's no money for servers to
>> run multiple Cassandra nodes.
>>
>> So, aside from the lack of HA, I'm asking if a single Cassandra node
>> would be viable in a production environment.  (There would be RAID 5 and
>> the RAID controller cache is backed by flash memory).
>>
>> I'm asking because I'm concerned about using Cassandra in a way that it's
>> not designed for.  That to me is the unsettling aspect.
>>
>> If this is a bad idea, give me the ammo I need to shoot it down.  I need
>> specific technical reasons.
>>
>> Thanks!
>>
>> --John
>>
>> On Fri, Jan 22, 2016 at 4:47 PM, Jack Krupansky > > wrote:
>>
>>> Is 

sstable structure

2015-01-02 Thread Nikolay Mihaylov
Hi

from some time I try to find the structure of sstable is it documented
somewhere or can anyone explain it to me

I am speaking about hex dump bytes stored on the disk.

Nick.


Re: Tombstones without DELETE

2015-01-02 Thread Nikolay Mihaylov
Hi Tyler,

sorry for very stupid question - what is a collection ?

Nick

On Wed, Dec 31, 2014 at 6:27 PM, Tyler Hobbs ty...@datastax.com wrote:

 Overwriting an entire collection also results in a tombstone being
 inserted.

 On Wed, Dec 24, 2014 at 7:09 AM, Ryan Svihla rsvi...@datastax.com wrote:

 You should probably ask on the Cassandra user mailling list.

 However, TTL is the only other case I can think of.

 On Tue, Dec 23, 2014 at 1:36 PM, Davide D'Agostino i...@daddye.it
 wrote:

 Hi there,

 Following this:
 https://groups.google.com/a/lists.datastax.com/forum/#!searchin/java-driver-user/tombstone/java-driver-user/cHE3OOSIXBU/moLXcif1zQwJ

 Under what conditions Cassandra generates a tombstone?

 Basically I have not even big table on cassandra (90M rows) in my code
 there is no delete and I use prepared statements (but binding all necessary
 values).

 I'm aware that a tombstone gets created when:

 1. You delete the row
 2. You set a column to null while previously it had a value
 3. When you use prepared statements and you don't bind all the values

 Anything else that I should be aware of?

 Thanks!

 To unsubscribe from this group and stop receiving emails from it, send
 an email to java-driver-user+unsubscr...@lists.datastax.com.




 --

 [image: datastax_logo.png] http://www.datastax.com/

 Ryan Svihla

 Solution Architect

 [image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
 http://www.linkedin.com/pub/ryan-svihla/12/621/727/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.




 --
 Tyler Hobbs
 DataStax http://datastax.com/



Re: disk space issue

2014-10-01 Thread Nikolay Mihaylov
my 2 cents:

try major compaction on the column family with TTL's - for sure will be
faster than full rebuild.

also try not cassandra related things, such check and remove old log files,
backups etc.

On Wed, Oct 1, 2014 at 9:34 AM, Sumod Pawgi spa...@gmail.com wrote:

 In the past in such scenarios it has helped us to check the partition
 where cassandra is installed and allocate more space for the partition.
 Maybe it is a disk space issue but it is good to check if it is related to
 the space allocation for the partition issue. My 2 cents.

 Sent from my iPhone

 On 01-Oct-2014, at 11:53 am, Dominic Letz dominicl...@exosite.com wrote:

 This is a shot into the dark but you could check whether you have too many
 snapshots laying around that you actually don't need. You can get rid of
 those with a quick nodetool clearsnapshot.

 On Wed, Oct 1, 2014 at 5:49 AM, cem cayiro...@gmail.com wrote:

 Hi All,

 I have a 7 node cluster. One node ran out of disk space and others are
 around 80% disk utilization.
 The data has 10 days TTL but I think compaction wasn't fast enough to
 clean up the expired data.  gc_grace value is set default. I have a
 replication factor of 3. Do you think that it may help if I delete all data
 for that node and run repair. Does node repair check the ttl value before
 retrieving data from other nodes? Do you have any other suggestions?

 Best Regards,
 Cem.




 --
 Dominic Letz
 Director of RD
 Exosite http://exosite.com




Re: Cassandra 2.0.7 always failes due to 'too may open files' error

2014-05-15 Thread Nikolay Mihaylov
sorry, probably somebody mentioned it, but did you checked global limit?

cat /proc/sys/fs/file-max
cat /proc/sys/fs/file-nr



On Mon, May 5, 2014 at 10:31 PM, Bryan Talbot bryan.tal...@playnext.comwrote:

 Running

 # cat /proc/$(cat /var/run/cassandra.pid)/limits

 as root or your cassandra user will tell you what limits it's actually
 running with.




 On Sun, May 4, 2014 at 10:12 PM, Yatong Zhang bluefl...@gmail.com wrote:

 I am running 'repair' when the error occurred. And just a few days before
 I changed the compaction strategy to 'leveled'. don know if this helps


 On Mon, May 5, 2014 at 1:10 PM, Yatong Zhang bluefl...@gmail.com wrote:

 Cassandra is running as root

 [root@storage5 ~]# ps aux | grep java
 root  1893 42.0 24.0 7630664 3904000 ? Sl   10:43  60:01 java
 -ea -javaagent:/mydb/cassandra/bin/../lib/jamm-0.2.5.jar
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities
 -XX:ThreadPriorityPolicy=42 -Xms3959M -Xmx3959M -Xmn400M
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
 -XX:+UseTLAB -XX:+UseCondCardMark -Djava.net.preferIPv4Stack=true
 -Dcom.sun.management.jmxremote.port=7199
 -Dcom.sun.management.jmxremote.ssl=false
 -Dcom.sun.management.jmxremote.authenticate=false
 -Dlog4j.configuration=log4j-server.properties
 -Dlog4j.defaultInitOverride=true -Dcassandra-pidfile=/var/run/cassandra.pid
 -cp
 /mydb/cassandra/bin/../conf:/mydb/cassandra/bin/../build/classes/main:/mydb/cassandra/bin/../build/classes/thrift:/mydb/cassandra/bin/../lib/antlr-3.2.jar:/mydb/cassandra/bin/../lib/apache-cassandra-2.0.7.jar:/mydb/cassandra/bin/../lib/apache-cassandra-clientutil-2.0.7.jar:/mydb/cassandra/bin/../lib/apache-cassandra-thrift-2.0.7.jar:/mydb/cassandra/bin/../lib/commons-cli-1.1.jar:/mydb/cassandra/bin/../lib/commons-codec-1.2.jar:/mydb/cassandra/bin/../lib/commons-lang3-3.1.jar:/mydb/cassandra/bin/../lib/compress-lzf-0.8.4.jar:/mydb/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.3.jar:/mydb/cassandra/bin/../lib/disruptor-3.0.1.jar:/mydb/cassandra/bin/../lib/guava-15.0.jar:/mydb/cassandra/bin/../lib/high-scale-lib-1.1.2.jar:/mydb/cassandra/bin/../lib/jackson-core-asl-1.9.2.jar:/mydb/cassandra/bin/../lib/jackson-mapper-asl-1.9.2.jar:/mydb/cassandra/bin/../lib/jamm-0.2.5.jar:/mydb/cassandra/bin/../lib/jbcrypt-0.3m.jar:/mydb/cassandra/bin/../lib/jline-1.0.jar:/mydb/cassandra/bin/../lib/json-simple-1.1.jar:/mydb/cassandra/bin/../lib/libthrift-0.9.1.jar:/mydb/cassandra/bin/../lib/log4j-1.2.16.jar:/mydb/cassandra/bin/../lib/lz4-1.2.0.jar:/mydb/cassandra/bin/../lib/metrics-core-2.2.0.jar:/mydb/cassandra/bin/../lib/netty-3.6.6.Final.jar:/mydb/cassandra/bin/../lib/reporter-config-2.1.0.jar:/mydb/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/mydb/cassandra/bin/../lib/slf4j-api-1.7.2.jar:/mydb/cassandra/bin/../lib/slf4j-log4j12-1.7.2.jar:/mydb/cassandra/bin/../lib/snakeyaml-1.11.jar:/mydb/cassandra/bin/../lib/snappy-java-1.0.5.jar:/mydb/cassandra/bin/../lib/snaptree-0.1.jar:/mydb/cassandra/bin/../lib/super-csv-2.1.0.jar:/mydb/cassandra/bin/../lib/thrift-server-0.3.3.jar
 org.apache.cassandra.service.CassandraDaemon




 On Mon, May 5, 2014 at 1:02 PM, Philip Persad 
 philip.per...@gmail.comwrote:

 Have you tried running ulimit -a as the Cassandra user instead of as
 root? It is possible that your configured a high file limit for root but
 not for the user running the Cassandra process.


 On Sun, May 4, 2014 at 6:07 PM, Yatong Zhang bluefl...@gmail.comwrote:

 [root@storage5 ~]# lsof -n | grep java | wc -l
 5103
 [root@storage5 ~]# lsof | wc -l
 6567


 It's mentioned in previous mail:)


 On Mon, May 5, 2014 at 9:03 AM, nash nas...@gmail.com wrote:

 The lsof command or /proc can tell you how many open files it has.
 How many is it?

 --nash










Re: Avoiding email duplicates when registering users

2014-05-13 Thread Nikolay Mihaylov
the real question is - if you want the email to be unique, why use
surrogate primary key as UUID.

I wonder what UUID gives you at all?

If you want to have non email primary key, why not use md5(email) ?




On Wed, May 7, 2014 at 2:19 AM, Tyler Hobbs ty...@datastax.com wrote:


 On Mon, May 5, 2014 at 10:27 AM, Ignacio Martin natx...@gmail.com wrote:


 When a user registers, the server generates a UUID and performs an INSERT
 ... IF NOT EXISTS into the email_to_UUID table. Immediately after, perform
 a SELECT from the same table and see if the read UUID is the same that the
 one we just generated. If it is, we are allowed to INSERT the data in the
 user table, knowing that no other will be doing it.


 INSERT ... IF NOT EXISTS is the correct thing to do here, but you don't
 need to SELECT afterwards.  If the row does exist, the query results will
 show that the insert was not applied and the existing row will be returned.


 --
 Tyler Hobbs
 DataStax http://datastax.com/



Re: DELETE does not delete :)

2013-10-07 Thread Nikolay Mihaylov
Hi

my two cents - before doing anything else, make sure clocks are
synchronized to the millisecond.
ntp will do so.

Nick.


On Mon, Oct 7, 2013 at 9:02 AM, Alexander Shutyaev shuty...@gmail.comwrote:

 Hi all,

 We have encountered the following problem with cassandra.

 * We use *cassandra v2.0.0* from *Datastax* community repo.

 * We have *3 nodes* in a cluster, all of them are seed providers.

 * We have a *single keyspace* with *replication factor = 3*:

 *CREATE KEYSPACE bof WITH replication = {*
 *  'class': 'SimpleStrategy',*
 *  'replication_factor': '3'*
 *};*

 * We use *Datastax Java CQL Driver v1.0.3* in our application.

 * We have not modified any *consistency settings* in our app, so I assume
 we have the *default QUORUM* (2 out of 3 in our case) consistency *for
 reads and writes*.

 * We have 400+ tables which can be divided in two groups (*main* and *uids
 *). All tables in a group have the same definition, they vary only by
 name. The sample definitions are:

 *CREATE TABLE bookingfile (*
 *  key text,*
 *  entity_created timestamp,*
 *  entity_createdby text,*
 *  entity_entitytype text,*
 *  entity_modified timestamp,*
 *  entity_modifiedby text,*
 *  entity_status text,*
 *  entity_uid text,*
 *  entity_updatepolicy text,*
 *  version_created timestamp,*
 *  version_createdby text,*
 *  version_data blob,*
 *  version_dataformat text,*
 *  version_datasource text,*
 *  version_modified timestamp,*
 *  version_modifiedby text,*
 *  version_uid text,*
 *  version_versionnotes text,*
 *  version_versionnumber int,*
 *  versionscount int,*
 *  PRIMARY KEY (key)*
 *) WITH*
 *  bloom_filter_fp_chance=0.01 AND*
 *  caching='KEYS_ONLY' AND*
 *  comment='' AND*
 *  dclocal_read_repair_chance=0.00 AND*
 *  gc_grace_seconds=864000 AND*
 *  index_interval=128 AND*
 *  read_repair_chance=0.10 AND*
 *  replicate_on_write='true' AND*
 *  populate_io_cache_on_flush='false' AND*
 *  default_time_to_live=0 AND*
 *  speculative_retry='NONE' AND*
 *  memtable_flush_period_in_ms=0 AND*
 *  compaction={'class': 'SizeTieredCompactionStrategy'} AND*
 *  compression={'sstable_compression': 'LZ4Compressor'};*

 *CREATE TABLE bookingfile_uids (*
 *  date text,*
 *  timeanduid text,*
 *  deleted boolean,*
 *  PRIMARY KEY (date, timeanduid)*
 *) WITH*
 *  bloom_filter_fp_chance=0.01 AND*
 *  caching='KEYS_ONLY' AND*
 *  comment='' AND*
 *  dclocal_read_repair_chance=0.00 AND*
 *  gc_grace_seconds=864000 AND*
 *  index_interval=128 AND*
 *  read_repair_chance=0.10 AND*
 *  replicate_on_write='true' AND*
 *  populate_io_cache_on_flush='false' AND*
 *  default_time_to_live=0 AND*
 *  speculative_retry='NONE' AND*
 *  memtable_flush_period_in_ms=0 AND*
 *  compaction={'class': 'SizeTieredCompactionStrategy'} AND*
 *  compression={'sstable_compression': 'LZ4Compressor'};*
 *
 *
 *CREATE INDEX BookingFile_uids_deleted ON bookingfile_uids (deleted);*

 * We don't have any problems with the tables from the *main* group.

 * As for the tables from the *uids* group we have noticed that sometimes
 deletes from these tables do not do their job. They don't fail, they just
 do nothing. We have confirmed this by adding a select query after deletes.
 Most times everything is ok and select returns 0 records. But sometimes (~5
 out of 100,000) it returns the supposedly deleted row.

 * We have logged the ExecutionInfo objects with query tracing that are
 returned by Datastax's driver. Here are the details

 *DELETE FROM bookingfile_uids WHERE date=C20131006 AND
 timeAndUid=195248590_4762ce41-d2d2-448d-be8c-c7fcb6b7394e*

 *ExecutionInfo: [*
 *triedHosts=/10.10.30.23;*
 *queriedHost=/10.10.30.23;*
  *achievedConsistencyLevel=null;*
 *queryTrace=*
 * Message received from /10.10.30.23 on /10.10.30.19[Thread-56] at Sun
 Oct 06 19:55:57 MSK 2013*
 * Acquiring switchLock read lock on /10.10.30.19[MutationStage:49] at Sun
 Oct 06 19:55:57 MSK 2013*
 * Appending to commitlog on /10.10.30.19[MutationStage:49] at Sun Oct 06
 19:55:57 MSK 2013*
 * Adding to bookingfile_uids memtable on /10.10.30.19[MutationStage:49]
 at Sun Oct 06 19:55:57 MSK 2013*
 * Enqueuing response to /10.10.30.23 on /10.10.30.19[MutationStage:49] at
 Sun Oct 06 19:55:57 MSK 2013*
 * Sending message to /10.10.30.23 on /10.10.30.19[WRITE-/10.10.30.23] at
 Sun Oct 06 19:55:57 MSK 2013*
 * Message received from /10.10.30.23 on /10.10.30.20[Thread-34] at Sun
 Oct 06 19:55:57 MSK 2013*
 * Acquiring switchLock read lock on /10.10.30.20[MutationStage:43] at Sun
 Oct 06 19:55:57 MSK 2013*
 * Appending to commitlog on /10.10.30.20[MutationStage:43] at Sun Oct 06
 19:55:57 MSK 2013*
 * Adding to bookingfile_uids memtable on /10.10.30.20[MutationStage:43]
 at Sun Oct 06 19:55:57 MSK 2013*
 * Enqueuing response to /10.10.30.23 on /10.10.30.20[MutationStage:43] at
 Sun Oct 06 19:55:57 MSK 2013*
 * Sending message to /10.10.30.23 on /10.10.30.20[WRITE-/10.10.30.23] at
 Sun Oct 06 19:55:57 MSK 2013*
 * Determining replicas for mutation on 
 

cassandra hi bandwith

2013-09-10 Thread Nikolay Mihaylov
Hi,

we have cassandra 1.2.6, single node.

we have a website there, running on different server.

recently we noticed that we have 40 MBit traffic from cassandra server to
the web server

we use phpcassa.

on ops center we have KeyCache Hits value around 2000 .

I found the most used CF's from nodetool, but I do not think the traffic
problem is there.

Is there a way I can find what are those keycache hits?

Nick.


Re: Random Distribution, yet Order Preserving Partitioner

2013-08-23 Thread Nikolay Mihaylov
It can handle some millions of columns, but not more like 10M. I mean, a
request for such a row concentrates on a particular node, so the
performance degrades.

 I also had idea for semi-ordered partitioner - instead of single MD5, to
have two MD5's.

works for us with wide row with about 40-50 M, but with lots of problems.

my research with get_count() shows first minor problems at 14-15K columns
in a row and then it just get worse.




On Fri, Aug 23, 2013 at 2:47 AM, Takenori Sato ts...@cloudian.com wrote:

 Hi Nick,

  token and key are not same. it was like this long time ago (single MD5
 assumed single key)

 True. That reminds me of making a test with the latest 1.2 instead of our
 current 1.0!

  if you want ordered, you probably can arrange your data in a way so you
 can get it in ordered fashion.

 Yeah, we have done for a long time. That's called a wide row, right? Or a
 compound primary key.

 It can handle some millions of columns, but not more like 10M. I mean, a
 request for such a row concentrates on a particular node, so the
 performance degrades.

  I also had idea for semi-ordered partitioner - instead of single MD5,
 to have two MD5's.

 Sounds interesting. But, we need a fully ordered result.

 Anyway, I will try with the latest version.

 Thanks,
 Takenori


 On Thu, Aug 22, 2013 at 6:12 PM, Nikolay Mihaylov n...@nmmm.nu wrote:

 my five cents -
 token and key are not same. it was like this long time ago (single MD5
 assumed single key)

 if you want ordered, you probably can arrange your data in a way so you
 can get it in ordered fashion.
 for example long ago, i had single column family with single key and
 about 2-3 M columns - I do not suggest you to do it this way, because is
 wrong way, but it is easy to understand the idea.

 I also had idea for semi-ordered partitioner - instead of single MD5, to
 have two MD5's.
 then you can get semi-ordered ranges, e.g. you get ordered all cities in
 Canada, all cities in US and so on.
 however in this way things may get pretty non-ballanced

 Nick





 On Thu, Aug 22, 2013 at 11:19 AM, Takenori Sato ts...@cloudian.comwrote:

 Hi,

 I am trying to implement a custom partitioner that evenly distributes,
 yet preserves order.

 The partitioner returns a token by BigInteger as RandomPartitioner does,
 while does a decorated key by string as OrderPreservingPartitioner does.
 * for now, since IPartitionerT does not support different types for
 token and key, BigInteger is simply converted to string

 Then, I played around with cassandra-cli. As expected, in my 3 nodes
 test cluster, get/set worked, but list(get_range_slices) didn't.

 This came from a challenge to overcome a wide row scalability. So, I
 want to make it work!

 I am aware that some efforts are required to make get_range_slices work.
 But are there any other critical problems? For example, it seems there is
 an assumption that token and key are the same. If this is throughout the
 whole C* code, this partitioner is not practical.

 Or have your tried something similar?

 I would appreciate your feedback!

 Thanks,
 Takenori






Re: Random Distribution, yet Order Preserving Partitioner

2013-08-22 Thread Nikolay Mihaylov
my five cents -
token and key are not same. it was like this long time ago (single MD5
assumed single key)

if you want ordered, you probably can arrange your data in a way so you can
get it in ordered fashion.
for example long ago, i had single column family with single key and about
2-3 M columns - I do not suggest you to do it this way, because is wrong
way, but it is easy to understand the idea.

I also had idea for semi-ordered partitioner - instead of single MD5, to
have two MD5's.
then you can get semi-ordered ranges, e.g. you get ordered all cities in
Canada, all cities in US and so on.
however in this way things may get pretty non-ballanced

Nick





On Thu, Aug 22, 2013 at 11:19 AM, Takenori Sato ts...@cloudian.com wrote:

 Hi,

 I am trying to implement a custom partitioner that evenly distributes, yet
 preserves order.

 The partitioner returns a token by BigInteger as RandomPartitioner does,
 while does a decorated key by string as OrderPreservingPartitioner does.
 * for now, since IPartitionerT does not support different types for
 token and key, BigInteger is simply converted to string

 Then, I played around with cassandra-cli. As expected, in my 3 nodes test
 cluster, get/set worked, but list(get_range_slices) didn't.

 This came from a challenge to overcome a wide row scalability. So, I want
 to make it work!

 I am aware that some efforts are required to make get_range_slices work.
 But are there any other critical problems? For example, it seems there is
 an assumption that token and key are the same. If this is throughout the
 whole C* code, this partitioner is not practical.

 Or have your tried something similar?

 I would appreciate your feedback!

 Thanks,
 Takenori



cassandra disk access

2013-08-07 Thread Nikolay Mihaylov
Hi

I am researching various hash-tables and b-trees on disk.

while I researched, I has a thoughts about cassandra sstables that I want
to verify it here.

1. cassandra sstable uses sequential disk I/O when created. e.g. disk head
write it from the beginning to the end. Assuming the disk is not
fragmented, the sstable is placed on disk sectors one after the other.

2. when cassandra lookups a key in sstable (assuming bloom-filter and other
stuff failed, also assuming the key is located in this single sstable),
cassandra DO NOT USE sequential I/O. She probably will read the
hash-table slot or similar structure, then cassandra will do another disk
seek in order to get the value (and probably the key). Also probably there
will need another seek, if there is key collision there will need
additional seeks.

3. once the data (e.g. the row) is located, a sequential read for entire
row will occur. (Once again I assume there is single well compacted
sstable). Also if disk is not fragmented, the data will be placed on disk
sectors one after the other.

Am I wrong?

Nick.


Re: cassandra disk access

2013-08-07 Thread Nikolay Mihaylov
thanks

It will use the Index Sample (RAM) first, then it will use full Index
(disk) and finally it will read data from SSTable (disk). There's no such
thing like collision in this case.

so it still have 2 seeks :)

where I can see the internal structure of the sstable i tried to find it
documented but was unable to find anything ?




On Wed, Aug 7, 2013 at 11:27 AM, Michał Michalski mich...@opera.com wrote:


  2. when cassandra lookups a key in sstable (assuming bloom-filter and
 other
 stuff failed, also assuming the key is located in this single sstable),
 cassandra DO NOT USE sequential I/O. She probably will read the
 hash-table slot or similar structure, then cassandra will do another disk
 seek in order to get the value (and probably the key). Also probably there
 will need another seek, if there is key collision there will need
 additional seeks.


 It will use the Index Sample (RAM) first, then it will use full Index
 (disk) and finally it will read data from SSTable (disk). There's no such
 thing like collision in this case.


  3. once the data (e.g. the row) is located, a sequential read for entire
 row will occur. (Once again I assume there is single well compacted
 sstable). Also if disk is not fragmented, the data will be placed on disk
 sectors one after the other.


 Yes, this is how I understand it too.

 M.




Re: unable to delete

2013-06-07 Thread Nikolay Mihaylov
Hi

please note that when you drop column family, the data on the disk is not
deleted.

this is something you should do yourself.

 Do the files get deleted on GC/server restart?
the question actually translates - do the column family existed after the
restart?

John pls correct me if I am explaining it wrong.

Nick.


On Mon, Jun 3, 2013 at 10:14 PM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Jun 3, 2013 at 11:57 AM, John R. Frank j...@mit.edu wrote:
  Is it considered normal for cassandra to experience this error:
 
  ERROR [NonPeriodicTasks:1] 2013-06-03 18:17:05,374
 SSTableDeletingTask.java
  (line 72) Unable to delete
  /raid0/cassandra/data/KEYSPACE/CF/KEYSPACE-CF-ic-19-Data.db (it
 will
  be removed on server restart; we'll also retry after GC)


 cassandra//src/java/org/apache/cassandra/io/sstable/SSTableDeletingTask.java
 
File datafile = new File(desc.filenameFor(Component.DATA));
 if (!datafile.delete())
 {
 logger.error(Unable to delete  + datafile +  (it will
 be removed on server restart; we'll also retry after GC));
 failedTasks.add(this);
 return;
 }
 

 There are contexts where it is appropriate for Cassandra to be unable
 to delete a file using io.File.delete.

 
 // Deleting sstables is tricky because the mmapping might not have
 been finalized yet,
 // and delete will fail (on Windows) until it is (we only force the
 unmapping on SUN VMs).
 // Additionally, we need to make sure to delete the data file first,
 so on restart the others
 // will be recognized as GCable.
 

 Do the files get deleted on GC/server restart?

  This is on the DataStax EC2 AMI in a two-node cluster.  After deleting
 1,000
  rows from a CF with 20,000 rows, the DB becomes slow, and I'm trying to
  figure out why.  Could this error message be pointing at a proximate
 cause?

 Almost certainly not. By the time that a sstable file is subject to
 deletion, it should no longer be live. When it is no longer live
 it is not in the read path.

 You can verify this by using nodetool getsstables on a given key.

 What operation are you trying to do when the DB becomes slow?

 =Rob



Re: column with TTL of 10 seconds lives very long...

2013-05-23 Thread Nikolay Mihaylov
Did you synchronized the clocks between servers?


On Thu, May 23, 2013 at 9:32 AM, Tamar Fraenkel ta...@tok-media.com wrote:

 Hi!
 I have Cassandra cluster with 3 node running version 1.0.11.

 I am using Hector HLockManagerImpl, which creates a keyspace named
 HLockManagerImpl and CF HLocks.
 For some reason I have a row with single column that should have expired
 yesterday who is still there.
 I tried deleting it using cli, but it is stuck...
 Any ideas how to delete it?

 Thanks,

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 [image: Inline image 1]

 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956



tokLogo.png

Re: nodetool ring generate strange info

2013-05-10 Thread Nikolay Mihaylov
do you use vnodes ?


On Fri, May 10, 2013 at 10:19 AM, 杨辉强 huiqiangy...@yunrang.com wrote:

 Hi, all
   I use ./bin/nodetool -h 10.21.229.32  ring

 It generates lots of info of same host like this:

 10.21.229.32  rack1   Up Normal  928.3 MB24.80%
8875305964978355793
 10.21.229.32  rack1   Up Normal  928.3 MB24.80%
8875770246221977199
 10.21.229.32  rack1   Up Normal  928.3 MB24.80%
8875903273282028661
 10.21.229.32  rack1   Up Normal  928.3 MB24.80%
9028992266297813652
 10.21.229.32  rack1   Up Normal  928.3 MB24.80%
9130157610675408105
 10.21.229.32  rack1   Up Normal  928.3 MB24.80%
9145604352014775913
 10.21.229.32  rack1   Up Normal  928.3 MB24.80%
9182228238626921304

 Does it normal?



Re: differences between DataStax Community Edition and Cassandra package

2013-04-19 Thread Nikolay Mihaylov
Is there alternative file systems running on top of cassandra?



On Fri, Apr 19, 2013 at 7:44 PM, Robert Coli rc...@eventbrite.com wrote:

 On Fri, Apr 19, 2013 at 4:18 AM, Goktug YILDIRIM
 goktug.yildi...@gmail.com wrote:
 
  I am sorry if this a very well know topic and I missed that. I wonder
 why one must use CFS. What is unavailable in Cassandra without CFS?


 The SOLR stuff (also only-in-DSE) uses CFS for Hadoop-like storage.
 This is so you can use full SOLR support with DSE Cassandra without
 needing Hadoop.

 http://www.datastax.com/dev/blog/cassandra-file-system-design
 
 The Cassandra File System (CFS) is an HDFS compatible filesystem built
 to replace the traditional Hadoop NameNode, Secondary NameNode and
 DataNode daemons. It is the foundation of our Hadoop support in
 DataStax Enterprise.

 The main design goals for the Cassandra File System were to first,
 simplify the operational overhead of Hadoop by removing the single
 points of failure in the Hadoop NameNode. Second, to offer easy Hadoop
 integration for Cassandra users (one distributed system is enough)
 

 =Rob



Re: differences between DataStax Community Edition and Cassandra package

2013-04-18 Thread Nikolay Mihaylov
whats CDFS ? I am sure you are not referring iso9660, e.g. CD-ROM
filesystem? :)


On Wed, Apr 17, 2013 at 10:42 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Apr 17, 2013 at 11:19 AM, aaron morton aa...@thelastpickle.comwrote:

 It's the same as the Apache version, but DSC comes with samples and the
 free version of Ops Centre.


 DSE also comes with Solr special sauce and CDFS.

 =Rob




Re: differences between DataStax Community Edition and Cassandra package

2013-04-18 Thread Nikolay Mihaylov
I thought so,
sorry to ask in this thread, but from some time I am wondering how CFS can
be installed on normal Cassandra?


On Thu, Apr 18, 2013 at 3:23 PM, Michal Michalski mich...@opera.com wrote:

 Probably Robert meant CFS: http://www.datastax.com/wp-**
 content/uploads/2012/09/WP-**DataStax-HDFSvsCFS.pdfhttp://www.datastax.com/wp-content/uploads/2012/09/WP-DataStax-HDFSvsCFS.pdf:-)

 W dniu 18.04.2013 14:10, Nikolay Mihaylov pisze:

  whats CDFS ? I am sure you are not referring iso9660, e.g. CD-ROM
 filesystem? :)


 On Wed, Apr 17, 2013 at 10:42 PM, Robert Coli rc...@eventbrite.com
 wrote:

  On Wed, Apr 17, 2013 at 11:19 AM, aaron morton aa...@thelastpickle.com
 **wrote:

  It's the same as the Apache version, but DSC comes with samples and the
 free version of Ops Centre.


  DSE also comes with Solr special sauce and CDFS.

 =Rob







Re: running cassandra on 8 GB servers

2013-04-15 Thread Nikolay Mihaylov
Just a small update here
currently running on one node with 7 GB heap and no JNA
all defaults except the heap, and everything looks OK.

On Sun, Apr 14, 2013 at 9:10 PM, aaron morton aa...@thelastpickle.comwrote:

 Hmmm, what is the recommendation for a 10G network if 1G was 300G to
 500GŠI am guessing I can't do 10 times that, correct?  But maybe I could
 squeak out 600G to 1T?

 Best thing to do would be run a test on how long it takes to repair or
 bootstrap a node. The 300GB to 500Gb was just a guideline.

 Cheers

-
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 13/04/2013, at 12:02 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

 Hmmm, what is the recommendation for a 10G network if 1G was 300G to
 500GŠI am guessing I can't do 10 times that, correct?  But maybe I could
 squeak out 600G to 1T?

 Thanks,
 Dean

 On 4/11/13 2:26 PM, aaron morton aa...@thelastpickle.com wrote:

 The data will be huge, I am estimating 4-6 TB per server. I know this
 is best, but those are my resources.

 You will have a very unhappy time.

 The general rule of thumb / guideline for a HDD based system with 1G
 networking is 300GB to 500Gb per node. See previous discussions on this
 topic for reasons.

 ERROR [Thrift:641] 2013-04-11 11:25:19,563 CassandraDaemon.java (line
 164) Exception in thread Thread[Thrift:641,5,main]
 ...
 INFO [StorageServiceShutdownHook] 2013-04-11 11:25:39,915
 ThriftServer.java (line 116) Stop listening to thrift clients

 What was the error ?

 What version are you using?
 If you have changed any defaults for memory in cassandra-env.sh or
 cassandra.yaml revert them. Generally C* will do the right thing and not
 OOM, unless you are trying to store a lot of data on a node that does not
 have enough memory. See this thread for background
 http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 12/04/2013, at 7:35 AM, Nikolay Mihaylov n...@nmmm.nu wrote:

 For one project I will need to run cassandra on following dedicated
 servers:

 Single CPU XEON 4 cores no hyper-threading, 8 GB RAM, 12 TB locally
 attached HDD's in some kind of RAID, visible as single HDD.

 I can do cluster of 20-30 such servers, may be even more.

 The data will be huge, I am estimating 4-6 TB per server. I know this
 is best, but those are my resources.

 Currently I am testing with one of such servers, except HDD is 300 GB.
 Every 15-20 hours, I get out of heap memory, e.g. something like:

 ERROR [Thrift:641] 2013-04-11 11:25:19,563 CassandraDaemon.java (line
 164) Exception in thread Thread[Thrift:641,5,main]
 ...
 INFO [StorageServiceShutdownHook] 2013-04-11 11:25:39,915
 ThriftServer.java (line 116) Stop listening to thrift clients
 INFO [StorageServiceShutdownHook] 2013-04-11 11:25:39,943
 Gossiper.java (line 1077) Announcing shutdown
 INFO [StorageServiceShutdownHook] 2013-04-11 11:26:08,613
 MessagingService.java (line 682) Waiting for messaging service to quiesce
 INFO [ACCEPT-/208.94.232.37] 2013-04-11 11:26:08,655
 MessagingService.java (line 888) MessagingService shutting down server
 thread.
 ERROR [Thrift:721] 2013-04-11 11:26:37,709 CustomTThreadPoolServer.java
 (line 217) Error occurred during processing of message.
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
 shut down

 Anyone have some advices about better utilization of such servers?

 Nick.







running cassandra on 8 GB servers

2013-04-11 Thread Nikolay Mihaylov
For one project I will need to run cassandra on following dedicated servers:

Single CPU XEON 4 cores no hyper-threading, 8 GB RAM, 12 TB locally
attached HDD's in some kind of RAID, visible as single HDD.

I can do cluster of 20-30 such servers, may be even more.

The data will be huge, I am estimating 4-6 TB per server. I know this is
best, but those are my resources.

Currently I am testing with one of such servers, except HDD is 300 GB.
Every 15-20 hours, I get out of heap memory, e.g. something like:

ERROR [Thrift:641] 2013-04-11 11:25:19,563 CassandraDaemon.java (line 164)
Exception in thread Thread[Thrift:641,5,main]
...
 INFO [StorageServiceShutdownHook] 2013-04-11 11:25:39,915
ThriftServer.java (line 116) Stop listening to thrift clients
 INFO [StorageServiceShutdownHook] 2013-04-11 11:25:39,943 Gossiper.java
(line 1077) Announcing shutdown
 INFO [StorageServiceShutdownHook] 2013-04-11 11:26:08,613
MessagingService.java (line 682) Waiting for messaging service to quiesce
 INFO [ACCEPT-/208.94.232.37] 2013-04-11 11:26:08,655 MessagingService.java
(line 888) MessagingService shutting down server thread.
ERROR [Thrift:721] 2013-04-11 11:26:37,709 CustomTThreadPoolServer.java
(line 217) Error occurred during processing of message.
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
shut down

Anyone have some advices about better utilization of such servers?

Nick.


Re: running cassandra on 8 GB servers

2013-04-11 Thread Nikolay Mihaylov
I am using 1.2.3, used default heap - 2 GB without JNA installed,
then modified heap to 4 GB / 400 MB young generation. + JNA installed.
bloom filter on the CF's is lowered (more false positives, less disk space).

 WARN [ScheduledTasks:1] 2013-04-11 11:09:41,899 GCInspector.java (line
142) Heap is 0.9885574036095974 full.  You may need to reduce memtable
and/or cache sizes.  Cassandra will now flush up to the two largest
memtables to free up memory.  Adjust flush_largest_memtables_at threshold
in cassandra.yaml if you don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2013-04-11 11:09:41,906 StorageService.java (line
3541) Flushing CFS(Keyspace='CRAWLER', ColumnFamily='counters') to relieve
memory pressure
 INFO [ScheduledTasks:1] 2013-04-11 11:09:41,949 ColumnFamilyStore.java
(line 637) Enqueuing flush of Memtable-counters@862481781(711504/6211531
serialized/live bytes, 11810 ops)
ERROR [Thrift:641] 2013-04-11 11:25:19,563 CassandraDaemon.java (line 164)
Exception in thread Thread[Thrift:641,5,main]
java.lang.OutOfMemoryError: *Java heap space*



On Thu, Apr 11, 2013 at 11:26 PM, aaron morton aa...@thelastpickle.comwrote:

  The data will be huge, I am estimating 4-6 TB per server. I know this is
 best, but those are my resources.
 You will have a very unhappy time.

 The general rule of thumb / guideline for a HDD based system with 1G
 networking is 300GB to 500Gb per node. See previous discussions on this
 topic for reasons.

  ERROR [Thrift:641] 2013-04-11 11:25:19,563 CassandraDaemon.java (line
 164) Exception in thread Thread[Thrift:641,5,main]
  ...
   INFO [StorageServiceShutdownHook] 2013-04-11 11:25:39,915
 ThriftServer.java (line 116) Stop listening to thrift clients
 What was the error ?

 What version are you using?
 If you have changed any defaults for memory in cassandra-env.sh or
 cassandra.yaml revert them. Generally C* will do the right thing and not
 OOM, unless you are trying to store a lot of data on a node that does not
 have enough memory. See this thread for background
 http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 12/04/2013, at 7:35 AM, Nikolay Mihaylov n...@nmmm.nu wrote:

  For one project I will need to run cassandra on following dedicated
 servers:
 
  Single CPU XEON 4 cores no hyper-threading, 8 GB RAM, 12 TB locally
 attached HDD's in some kind of RAID, visible as single HDD.
 
  I can do cluster of 20-30 such servers, may be even more.
 
  The data will be huge, I am estimating 4-6 TB per server. I know this is
 best, but those are my resources.
 
  Currently I am testing with one of such servers, except HDD is 300 GB.
 Every 15-20 hours, I get out of heap memory, e.g. something like:
 
  ERROR [Thrift:641] 2013-04-11 11:25:19,563 CassandraDaemon.java (line
 164) Exception in thread Thread[Thrift:641,5,main]
  ...
   INFO [StorageServiceShutdownHook] 2013-04-11 11:25:39,915
 ThriftServer.java (line 116) Stop listening to thrift clients
   INFO [StorageServiceShutdownHook] 2013-04-11 11:25:39,943 Gossiper.java
 (line 1077) Announcing shutdown
   INFO [StorageServiceShutdownHook] 2013-04-11 11:26:08,613
 MessagingService.java (line 682) Waiting for messaging service to quiesce
   INFO [ACCEPT-/208.94.232.37] 2013-04-11 11:26:08,655
 MessagingService.java (line 888) MessagingService shutting down server
 thread.
  ERROR [Thrift:721] 2013-04-11 11:26:37,709 CustomTThreadPoolServer.java
 (line 217) Error occurred during processing of message.
  java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
 shut down
 
  Anyone have some advices about better utilization of such servers?
 
  Nick.