date:20120802

[
https://issues.apache.org/jira/browse/CASSANDRA-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427193#comment-13427193
]

Sylvain Lebresne commented on CASSANDRA-2864:
-

I looked a bit into doing this 'skip the cache on read' for counters, but I
realized that there is some complications.

The way the current patch works, when we do a read through cache and there is
nothing cached, we do a normal read and cache the result. And by normal read,
I mean reading in all the memtables in particular. Which, I think, has 2
problems:
* this is racy, even in the non counter case. Suppose we start a read. Maybe
the current memtable is empty and we start reading the sstables. While that
happens, you could have updates coming in the memtable and, if you are unlucky,
have this memtable flushed almost right away. If it gets merged to the cache
before our read finishes and cache it's result, we have a problem similar to
CASSANDRA-3862. Granted this is very unlikely but it is possible, at least in
theory. Which means we probably need to have some sentinel business even with
this patch to be safe.
* this doesn't work for counters at all, because that means that when we cache
stuff, we have all the data that is currently in any current memtable. It means
that for counters, the read-to-cache should only read from the sstables, not
the memtables. But now that racy with flush and become quite subtle to
synchronize correctly. It's probably doable but makes me wonder if keeping the
current code path for counter is not way simpler.

Alternative Row Cache Implementation

Key: CASSANDRA-2864
URL: https://issues.apache.org/jira/browse/CASSANDRA-2864
Project: Cassandra
Issue Type: Improvement
Components: Core
Reporter: Daniel Doubleday
Assignee: Daniel Doubleday
Labels: cache
Fix For: 1.2

Attachments: 0001-CASSANDRA-2864-w-out-direct-counter-support.patch,
rowcache-with-snaptree-sketch.patch

we have been working on an alternative implementation to the existing row
cache(s)
We have 2 main goals:
- Decrease memory - get more rows in the cache without suffering a huge
performance penalty
- Reduce gc pressure
This sounds a lot like we should be using the new serializing cache in 0.8.
Unfortunately our workload consists of loads of updates which would
invalidate the cache all the time.
*Note: Updated Patch Description (Please check history if you're interested
where this was comming from)*
h3. Rough Idea
- Keep serialized row (ByteBuffer) in mem which represents unfiltered but
collated columns of all ssts but not memtable columns
- Writes dont affect the cache at all. They go only to the memtables
- Reads collect columns from memtables and row cache
- Serialized Row is re-written (merged) with mem tables when flushed
h3. Some Implementation Details
h4. Reads
- Basically the read logic differ from regular uncached reads only in that a
special CollationController which is deserializing columns from in memory
bytes
- In the first version of this cache the serialized in memory format was the
same as the fs format but test showed that performance sufferd because a lot
of unnecessary deserialization takes place and that columns seeks are O( n )
whithin one block
- To improve on that a different in memory format was used. It splits length
meta info and data of columns so that the names can be binary searched.
{noformat}
===
Header (24)
===
MaxTimestamp:long
LocalDeletionTime: int
MarkedForDeleteAt: long
NumColumns: int
===
Column Index (num cols * 12)
===
NameOffset: int
ValueOffset: int
ValueLength: int
===
Column Data
===
Name:byte[]
Value: byte[]
SerializationFlags: byte
Misc:?
Timestamp: long
---
Misc Counter Column
---
TSOfLastDelete: long
---
Misc Expiring Column
---
TimeToLive: int
LocalDeletionTime: int
===
{noformat}
- These rows are read by 2 new column interators which correspond to
SSTableNamesIterator and SSTableSliceIterator. During filtering only columns
that actually match are constructed. The searching / skipping is performed on
the raw ByteBuffer and does not create any objects.
- A special CollationController is used to access and collate via cache and
said new iterators.

[jira] [Created] (CASSANDRA-4482) In-memory merkle trees for repair

2012-08-02 Thread Marcus Eriksson (JIRA)

Marcus Eriksson created CASSANDRA-4482:
--

 Summary: In-memory merkle trees for repair
 Key: CASSANDRA-4482
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4482
 Project: Cassandra
  Issue Type: New Feature
Reporter: Marcus Eriksson


this sounds cool, we should reimplement it in the open source cassandra;

http://www.acunu.com/2/post/2012/07/incremental-repair.html



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427314#comment-13427314
 ] 

Sylvain Lebresne commented on CASSANDRA-3974:
-

I'm sorry I'm a little late to the discussion, but I'm not sure I'm a fan of 
using the metadata TTL to decide of expiration because:
# It means we use the column timestamp to decide of the expiration. However, we 
have been very careful so far to not use the column timestamp as a server side 
timestamp. And in particular, the patch assumes the timestamp is in 
microseconds, while most clients and CQL actually use microseconds.
# Altering the default TTL is imo more confusing that way, because we are 
pretending that altering the TTL will apply to all existing CF and columns, 
which itself suggests that if you want to remove everything older than say 1h, 
you can switch the TTL to 1h and then change it back right away to some other 
much longer value (or 0). But that's not the case, because the new TTL will 
only be applied to existing data only when compaction happens. And I really 
don't think that user visible behaviors should depends in any way on the timing 
of internal operations.
# This requires passing the CFMetadata in lots of places in the code, which 
isn't really nice. In particular, we should call isColumnExpiredFromDefaultTTL 
pretty much every time DeletionInfo.isDeleted() is called (after all, having an 
expired column is exatly the same than having a deleted one), and the current 
patch is missing quite a few places.

So I think I do prefer the idea of having the CF TTL just being the default TTL 
applied to columns when inserted if they don't have one. 


 Per-CF TTL
 --

 Key: CASSANDRA-3974
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
 Project: Cassandra
  Issue Type: New Feature
Affects Versions: 1.2
Reporter: Jonathan Ellis
Assignee: Kirk True
Priority: Minor
 Fix For: 1.2

 Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, 
 trunk-3974v4.txt


 Per-CF TTL would allow compaction optimizations (drop an entire sstable's 
 worth of expired data) that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[Cassandra Wiki] Update of ArchitectureInternals_JP by Kazuki Aranami

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The ArchitectureInternals_JP page has been changed by Kazuki Aranami:
http://wiki.apache.org/cassandra/ArchitectureInternals_JP?action=diffrev1=17rev2=18

Comment:
add accrual failure detector

   * 詳細はArchitectureGossipを参照してください。
  
  = 故障の検出 =
-  * The Phi accrual failure detector(Φ漸増型故障検出器) 
http://vsedach.googlepages.com/HDY04.pdf をベースにしています。
+  * 京都産業大学コンピュータ理工学部コンピュータサイエンス学科の林原尚浩助教授が、北陸先端科学技術大学院大学に在学していた時に開発したThe Phi 
accrual failure detector(Φアクルーアル故障検出方式) 
http://vsedach.googlepages.com/HDY04.pdf 
をベースにしています。林原尚浩先生の開発した「アクルーアル故障検出方式」に関する、その他の論文などの情報は、京都産業大学の[[http://www.kyoto-su.ac.jp/liaison/kenkyu/message43.html|研究室紹介]]をご覧下さい。
  
  = より理解を深めるために =
   * タスクをステージに分割して別々のスレッドプールを割り当てるアイディアはSEDAの論文 
http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf から頂きました。

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427342#comment-13427342
 ] 

Jonathan Ellis commented on CASSANDRA-3974:
---

If we're just going to have CF TTL being sugar for clients too lazy to apply 
what they want, then I'm not interested.

But if we use CF TTL to provide an upper bound on how long data can live, then 
we open the door for some interesting optimizations.

 Per-CF TTL
 --

 Key: CASSANDRA-3974
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
 Project: Cassandra
  Issue Type: New Feature
Affects Versions: 1.2
Reporter: Jonathan Ellis
Assignee: Kirk True
Priority: Minor
 Fix For: 1.2

 Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, 
 trunk-3974v4.txt


 Per-CF TTL would allow compaction optimizations (drop an entire sstable's 
 worth of expired data) that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427343#comment-13427343
 ] 

Sylvain Lebresne commented on CASSANDRA-3974:
-

Well, if the goal is just to be able to drop entire sstables when we know 
everything is expired, we could compute and keep in the metadata the min TTL of 
the sstable. 

 Per-CF TTL
 --

 Key: CASSANDRA-3974
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
 Project: Cassandra
  Issue Type: New Feature
Affects Versions: 1.2
Reporter: Jonathan Ellis
Assignee: Kirk True
Priority: Minor
 Fix For: 1.2

 Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, 
 trunk-3974v4.txt


 Per-CF TTL would allow compaction optimizations (drop an entire sstable's 
 worth of expired data) that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427346#comment-13427346
 ] 

Jonathan Ellis commented on CASSANDRA-3974:
---

Hmm.  Now that you mention it, Yuki already added that in CASSANDRA-3442...

 Per-CF TTL
 --

 Key: CASSANDRA-3974
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
 Project: Cassandra
  Issue Type: New Feature
Affects Versions: 1.2
Reporter: Jonathan Ellis
Assignee: Kirk True
Priority: Minor
 Fix For: 1.2

 Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, 
 trunk-3974v4.txt


 Per-CF TTL would allow compaction optimizations (drop an entire sstable's 
 worth of expired data) that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4482) In-memory merkle trees for repair


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427347#comment-13427347
 ] 

Jonathan Ellis commented on CASSANDRA-4482:
---

This is how the original Dynamo paper describes maintaining merkle trees.  The 
problem that Acunu doesn't mention is that this forces you to do read and 
rehash all the rows sharing the tree leaf with row X, whenever any row X is 
updated.  So you are trading sequential i/o for random i/o... not a good move, 
unless you assume SSD or a small dataset (and even then, you're rehashing many 
rows on each update, not just one).

I'm a bigger fan of the continuous repair approach enabled by CASSANDRA-3912, 
and discussed in CASSANDRA-2699 (although i think 2699 overcomplicates things).

 In-memory merkle trees for repair
 -

 Key: CASSANDRA-4482
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4482
 Project: Cassandra
  Issue Type: New Feature
Reporter: Marcus Eriksson

 this sounds cool, we should reimplement it in the open source cassandra;
 http://www.acunu.com/2/post/2012/07/incremental-repair.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (CASSANDRA-4482) In-memory merkle trees for repair

[
https://issues.apache.org/jira/browse/CASSANDRA-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427347#comment-13427347
]

Jonathan Ellis edited comment on CASSANDRA-4482 at 8/2/12 2:34 PM:
---

This is how the original Dynamo paper describes maintaining merkle trees. The
problem that Acunu doesn't mention is that this forces you to do read and
rehash all the rows sharing the tree leaf with row X, whenever any row X is
updated. So you are trading sequential i/o for random i/o... not a good move,
unless you assume SSD or a small dataset (and even then, you're rehashing many
rows on each update, not just one, so it's far from clear that this is a good
trade).

I'm a bigger fan of the continuous repair approach enabled by CASSANDRA-3912,
and discussed in CASSANDRA-2699 (although i think 2699 overcomplicates things).

was (Author: jbellis):
This is how the original Dynamo paper describes maintaining merkle trees.
The problem that Acunu doesn't mention is that this forces you to do read and
rehash all the rows sharing the tree leaf with row X, whenever any row X is
updated. So you are trading sequential i/o for random i/o... not a good move,
unless you assume SSD or a small dataset (and even then, you're rehashing many
rows on each update, not just one).

I'm a bigger fan of the continuous repair approach enabled by CASSANDRA-3912,
and discussed in CASSANDRA-2699 (although i think 2699 overcomplicates things).

In-memory merkle trees for repair
-

Key: CASSANDRA-4482
URL: https://issues.apache.org/jira/browse/CASSANDRA-4482
Project: Cassandra
Issue Type: New Feature
Reporter: Marcus Eriksson

this sounds cool, we should reimplement it in the open source cassandra;
http://www.acunu.com/2/post/2012/07/incremental-repair.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[Cassandra Wiki] Update of FrontPage_JP by Kazuki Aranami

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The FrontPage_JP page has been changed by Kazuki Aranami:
http://wiki.apache.org/cassandra/FrontPage_JP?action=diffrev1=85rev2=86

  
  Cassandraは、非常に高いスケーラビリティーを持ち、イベンチュアルコンシステントな分散システム構造のKVS(Key Value Store)です。
  Cassandraは、主にBerkeley 
DBとMySQLから構成される[[http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf|Amazon
 Dynamo 
(PDF)]]の分散ハッシュテーブル（DHT）と、[[http://research.google.com/archive/bigtable-osdi06.pdf|Google
 BigTable (PDF)]]のデータモデルという分散システムのテクノロジーを併せ持っています。
- Amazon 
Dynamoのように、Cassandraは[[http://www.allthingsdistributed.com/2008/12/eventually_consistent.html|イベンチュアルコンシステント]]であり、Google
 BigTableのようにCassandraは典型的なKVS(Key Value Store)より豊かなカラムファミリーベースのデータ・モデルを提供します。
+ Amazon 
Dynamoのように、Cassandraは[[http://www.allthingsdistributed.com/2008/12/eventually_consistent.html|イベンチュアルコンシステント]]であり、Google
 BigTableのようにCassandraは典型的なKVS(Key Value Store)より豊かなカラムファミリーベースのデータモデルを提供します。
  
  
- Cassandraは、2008年にFacebookによってオープンソースとなり、Avinash Lakshman (Amazon 
Dynamoの作者のうちの一人)と、Prashant Malik (Facebookエンジニア)によって設計されました。Cassandraは、Amazon 
DynamoとGoogle BigTableの融合によって生まれたものであり、''Dynamo 
2.0''とも考えることができます。Cassandraは、Facebookにおけるプロダクション用途にあるが、まだ発展途上のプロダクトです。
+ Cassandraは、2008年7月にFacebookによってオープンソースとして公開されました。Cassandraは、Apache 
Incubatorプロジェクトになり、2010年3月に、CassandraはApacheのトップレベルプロジェクトとなりました。Cassandraは、Avinash
 Lakshman (Amazon Dynamoの作者のうちの一人)と、Prashant Malik 
(Facebookエンジニア)によって設計されました。Cassandraは、Amazon DynamoとGoogle 
BigTableの融合によって生まれたものであり、''Dynamo 
2.0''とも考えることができます。Facebookのインボックスサーチのために、Cassandraのオリジナルバージョンは開発されましたが、現在はFacebookにおいては使用されていません。その代わり、いまでは多くの国内外の企業でCassandraは採用されています。
  
  == 概要 ==
   * [[http://cassandra.apache.org/|Cassandra公式Webサイト]] 
(リリース版のダウンロード、バグトラッキング、メーリングリストなど)
@@ -28, +28 @@

   * [[GettingStarted_JP|はじめに]]
   * [[http://www.datastax.com/docs|DatastaxのCassandraドキュメント]]
   * [[http://www.datastax.com/docs/0.6_jp/index|DatastaxのApache Cassandra 
日本語マニュアル 0.6]] - 注: 残念ながら内容的に古くなっています。0.7以降の情報については上記英語版の情報を参照して下さい。
-  * [[ClientOptions_JP|クライアント一覧: Cassandraへのアクセス方法]] -- Ruby, Python, 
Scalaその他とのインターフェースについて
+  * [[ClientOptions_JP|クライアント一覧: Cassandraへのアクセス方法]] -- 
Ruby、Python、Scalaなど、その他のプログラミング言語のインターフェースについて
   * [[IntegrationPoints|インテグレーションポイント]] -- list of ways Cassandra is 
integrated with other projects/products
   * [[RunningCassandra_JP|Cassandraを動かす]]
   * [[ArchitectureOverview_JP|アーキテクチャオーバビュー]]
@@ -71, +71 @@

   * Cassandra開発者メーリングリスト： d...@cassandra.apache.org 
[[mailto:dev-subscr...@cassandra.apache.org|(購読する)]] 
[[http://www.mail-archive.com/dev@cassandra.apache.org/|(アーカイブ)]] 
[[http://www.mail-archive.com/cassandra-dev@incubator.apache.org/|(インキュベーター 
アーカイブ)]]
   * Cassandraコミット通知用メーリングリスト: commits@cassandra.apache.org 
[[mailto:commits-subscr...@cassandra.apache.org|(購読する)]]
  
- == 日本におけるコミュニティ ==
+ == 日本におけるコミュニティー ==
-  * [[https://sites.google.com/site/cassandrajapan/home|日本Cassandraユーザ会]]
+  * [[https://sites.google.com/site/cassandrajapan/home|日本Cassandraユーザー会]]
  
  
  == 関連ドキュメント ==
@@ -82, +82 @@

  == Google SoC 2010 Page ==
   * [[GoogleSoc2010|Google SoC]]
  
+ == Thanks ==
+  * YourKit is kindly supporting open source projects with its full-featured 
Java Profiler. YourKit, LLC is the creator of innovative and intelligent tools 
for profiling Java and .NET applications. Take a look at YourKit's leading 
software products: [[http://www.yourkit.com/java/profiler/index.jsp|YourKit 
Java Profiler]] and [[http://www.yourkit.com/.net/profiler/index.jsp|YourKit 
.NET Profiler]]
+ 
- This wiki is powered by MoinMoin. With the exception of a few immutable 
pages, anyone can edit it. Try SyntaxReference if you need help on wiki markup, 
and FindPage or SiteNavigation to search for existing pages before creating a 
new one. If you aren't sure where to begin, checkout RecentChanges to see what 
others have been working on, or RandomPage if you are feeling lucky.
+ This wiki is powered by MoinMoin.  With the exception of a few immutable 
pages, anyone can edit it. Try SyntaxReference if you need help on wiki markup, 
and FindPage or SiteNavigation to search for existing pages before creating a 
new one. If you aren't sure where to begin, checkout RecentChanges to see what 
others have been working on, or RandomPage if you are feeling lucky.
  
  == その他の言語 ==
   * [[FrontPage|English 英語]]
   * [[首页|SimpleChinese 简体中文]]
+  * [[FrontPage_CHT|Traditional Chinese 繁體中文]]
   * [[FrontPage_PT-BR|BrazilianPortuguese Português do Brasil]]

[jira] [Commented] (CASSANDRA-3680) Add Support for Composite Secondary Indexes


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427369#comment-13427369
 ] 

Sylvain Lebresne commented on CASSANDRA-3680:
-

I've pushed a rebased version of the patch above at 
https://github.com/pcmanus/cassandra/commits/3680-2. The previous comments 
still applies though.

 Add Support for Composite Secondary Indexes
 ---

 Key: CASSANDRA-3680
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3680
 Project: Cassandra
  Issue Type: Sub-task
Reporter: T Jake Luciani
Assignee: Sylvain Lebresne
  Labels: cql3, secondary_index
 Fix For: 1.2

 Attachments: 0001-Secondary-indexes-on-composite-columns.txt


 CASSANDRA-2474 and CASSANDRA-3647 add the ability to transpose wide rows 
 differently, for efficiency and functionality secondary index api needs to be 
 altered to allow composite indexes.  
 I think this will require the IndexManager api to have a 
 maybeIndex(ByteBuffer column) method that SS can call and implement a 
 PerRowSecondaryIndex per column, break the composite into parts and index 
 specific bits, also including the base rowkey.
 Then a search against a TRANSPOSED row or DOCUMENT will be possible.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[Cassandra Wiki] Trivial Update of ArchitectureInternals_JP by Kazuki Aranami

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The ArchitectureInternals_JP page has been changed by Kazuki Aranami:
http://wiki.apache.org/cassandra/ArchitectureInternals_JP?action=diffrev1=18rev2=19

   * 詳細はArchitectureGossipを参照してください。
  
  = 故障の検出 =
-  * 京都産業大学コンピュータ理工学部コンピュータサイエンス学科の林原尚浩助教授が、北陸先端科学技術大学院大学に在学していた時に開発したThe Phi 
accrual failure detector(Φアクルーアル故障検出方式) 
http://vsedach.googlepages.com/HDY04.pdf 
をベースにしています。林原尚浩先生の開発した「アクルーアル故障検出方式」に関する、その他の論文などの情報は、京都産業大学の[[http://www.kyoto-su.ac.jp/liaison/kenkyu/message43.html|研究室紹介]]をご覧下さい。
+  * 京都産業大学コンピュータ理工学部コンピュータサイエンス学科の林原尚浩助教授が、北陸先端科学技術大学院大学に在学していた時に開発したThe Phi 
accrual failure detector(Φアクルーアル故障検出方式) 
http://vsedach.googlepages.com/HDY04.pdf 
をベースにしています。林原尚浩先生の開発した「アクルーアル故障検出方式」に関する、その他の情報は、京都産業大学のhttp://rudds.kyoto-su.ac.jp/jp/wiki.cgi?page=Research[[http://www.kyoto-su.ac.jp/liaison/kenkyu/message43.html|高信頼分散システム研究室の研究概要]]や、[[http://www.kyoto-su.ac.jp/liaison/kenkyu/message43.html|研究室紹介]]をご覧下さい。
  
  = より理解を深めるために =
   * タスクをステージに分割して別々のスレッドプールを割り当てるアイディアはSEDAの論文 
http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf から頂きました。

[Cassandra Wiki] Trivial Update of ArchitectureInternals_JP by Kazuki Aranami

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The ArchitectureInternals_JP page has been changed by Kazuki Aranami:
http://wiki.apache.org/cassandra/ArchitectureInternals_JP?action=diffrev1=19rev2=20

   * [[DistributedDeletes_JP|DistributedDeletes]]を参照。
  
  = ゴシッププロトコル =
-  * Efficient reconciliation and flow control for anti-entropy protocols 
http://www.cs.cornell.edu/home/rvr/papers/flowgossip.pdf をベースにしています。
+  * [[http://www.cs.cornell.edu/home/rvr/papers/flowgossip.pdf|Efficient 
reconciliation and flow control for anti-entropy protocols（PDF）]]をベースにしています。
   * 詳細はArchitectureGossipを参照してください。
  
  = 故障の検出 =
-  * 京都産業大学コンピュータ理工学部コンピュータサイエンス学科の林原尚浩助教授が、北陸先端科学技術大学院大学に在学していた時に開発したThe Phi 
accrual failure detector(Φアクルーアル故障検出方式) 
http://vsedach.googlepages.com/HDY04.pdf 
をベースにしています。林原尚浩先生の開発した「アクルーアル故障検出方式」に関する、その他の情報は、京都産業大学のhttp://rudds.kyoto-su.ac.jp/jp/wiki.cgi?page=Research[[http://www.kyoto-su.ac.jp/liaison/kenkyu/message43.html|高信頼分散システム研究室の研究概要]]や、[[http://www.kyoto-su.ac.jp/liaison/kenkyu/message43.html|研究室紹介]]をご覧下さい。
+  * 
京都産業大学コンピュータ理工学部コンピュータサイエンス学科の林原尚浩助教授が、北陸先端科学技術大学院大学に在学していた時に開発した[[http://vsedach.googlepages.com/HDY04.pdf|The
 Phi accrual failure 
detector(Φアクルーアル故障検出方式)（PDF）]]をベースにしています。林原尚浩先生の開発した「アクルーアル故障検出方式」に関する、その他の情報は、京都産業大学の[[http://rudds.kyoto-su.ac.jp/jp/wiki.cgi?page=Research|高信頼分散システム研究室の研究概要]]や、[[http://www.kyoto-su.ac.jp/liaison/kenkyu/message43.html|研究室紹介]]をご覧下さい。
  
  = より理解を深めるために =
-  * タスクをステージに分割して別々のスレッドプールを割り当てるアイディアはSEDAの論文 
http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf から頂きました。
+  * 
タスクをステージに分割して別々のスレッドプールを割り当てるアイディアは[[http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf|SEDA（PDF）]]の論文から頂きました。
   * 
「クラッシュオンリー(Crash-only)」な設計は、広範囲にわたって適用されている原則です。[[http://lwn.net/Articles/191059/|Valerie
 HensonのLWNにおける記事]]が入門には最適です。
-  * Cassandraの分散はAmazonのDynamo論文に記載されているものに非常に関連しています。Read 
repair、調整可能な一貫性レベル、Hinted 
Handoff、その他のコンセプトが議論されています。これはバックグラウンドの知識として必読のマテリアルです。http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
 
他に[[http://www.allthingsdistributed.com/2008/12/eventually_consistent.html|イベンチュアルコンシステンシーに関する記事]]も関連があります。
+  * Cassandraの分散システムの仕組みは、AmazonのDynamo論文に記載されているものに密接に関連しています。Read 
repair、調整可能な一貫性レベル、Hinted 
Handoff、その他のコンセプトが議論されています。これはバックグラウンドの知識として必読の[[http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html|マテリアル]]です。
 
他に[[http://www.allthingsdistributed.com/2008/12/eventually_consistent.html|イベンチュアルコンシステンシーに関する記事]]も関連があります。
   * 
Cassandraのディスク上のストレージモデルは[[http://labs.google.com/papers/bigtable.html|Bigtableの論文]]のセクション5.3と5.4にほぼ基づいています。
-  * 
FacebookのCassandraチームがLADIS09でCassandraの論文を発表しました。http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf
 Apache Cassandraにもほとんどの情報を適用可能です。(!ZooKeeperとの統合部分が主な違いです。)
+  * 
FacebookのCassandraチームがLADIS09で[[http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf|Cassandraの論文（PDF）]]を発表しました。いまでは、ZooKeeperとの統合部分が主な違いとなっています。

[jira] [Commented] (CASSANDRA-4292) Per-disk I/O queues


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427392#comment-13427392
 ] 

Jonathan Ellis commented on CASSANDRA-4292:
---

v3 looks good enough to do some performance testing to see if it's worth 
polishing more. :)

bq. Can we use CopyOnWriteArrayList 

Nit: Looking at this again it should probably actually be an ImmutableList.

 Per-disk I/O queues
 ---

 Key: CASSANDRA-4292
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4292
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Yuki Morishita
 Fix For: 1.2

 Attachments: 4292-v2.txt, 4292-v3.txt, 4292.txt


 As noted in CASSANDRA-809, we have a certain amount of flush (and compaction) 
 threads, which mix and match disk volumes indiscriminately.  It may be worth 
 creating a tight thread - disk affinity, to prevent unnecessary conflict at 
 that level.
 OTOH as SSDs become more prevalent this becomes a non-issue.  Unclear how 
 much pain this actually causes in practice in the meantime.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427397#comment-13427397
 ] 

Jonathan Ellis commented on CASSANDRA-3974:
---

bq. If we're just going to have CF TTL being sugar for clients too lazy to 
apply what they want, then I'm not interested.

I guess it would be a good thing to have for CQL though by the same reasoning 
as CASSANDRA-4448.

 Per-CF TTL
 --

 Key: CASSANDRA-3974
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
 Project: Cassandra
  Issue Type: New Feature
Affects Versions: 1.2
Reporter: Jonathan Ellis
Assignee: Kirk True
Priority: Minor
 Fix For: 1.2

 Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, 
 trunk-3974v4.txt


 Per-CF TTL would allow compaction optimizations (drop an entire sstable's 
 worth of expired data) that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427399#comment-13427399
 ] 

Sylvain Lebresne commented on CASSANDRA-3974:
-

bq. I guess it would be a good thing to have for CQL though by the same 
reasoning as CASSANDRA-4448.

Agreed.

 Per-CF TTL
 --

 Key: CASSANDRA-3974
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
 Project: Cassandra
  Issue Type: New Feature
Affects Versions: 1.2
Reporter: Jonathan Ellis
Assignee: Kirk True
Priority: Minor
 Fix For: 1.2

 Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, 
 trunk-3974v4.txt


 Per-CF TTL would allow compaction optimizations (drop an entire sstable's 
 worth of expired data) that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-4483) Restarting service

2012-08-02 Thread Vladimir Barinov (JIRA)

Vladimir Barinov created CASSANDRA-4483:
---

 Summary: Restarting service
 Key: CASSANDRA-4483
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4483
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.2
 Environment: CentOS release 6.2 (Final)
Reporter: Vladimir Barinov


Cassandra cannot restart too quickly. 
# /etc/init.d/cassandra status
cassandra (pid  6843) is running...
# /etc/init.d/cassandra restart
Shutdown Cassandra: OK
Starting Cassandra: OK
# /etc/init.d/cassandra status
cassandra is stopped

cassandra log:

xss =  -ea -javaagent:/usr/share/cassandra//lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1024M -Xmx1024M 
-Xmn200M -XX:+HeapDumpOnOutOfMemoryError -Xss128k
Error: Exception thrown by the agent : java.rmi.server.ExportException: Port 
already in use: 7199; nested exception is: 
java.net.BindException: Address already in use


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4483) Restarting service

2012-08-02 Thread Vladimir Barinov (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Barinov updated CASSANDRA-4483:


Description: 
Cassandra can't restart too quickly.
# /etc/init.d/cassandra status
cassandra (pid  6843) is running...
# /etc/init.d/cassandra restart
Shutdown Cassandra: OK
Starting Cassandra: OK
# /etc/init.d/cassandra status
cassandra is stopped

cassandra log:

xss =  -ea -javaagent:/usr/share/cassandra//lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1024M -Xmx1024M 
-Xmn200M -XX:+HeapDumpOnOutOfMemoryError -Xss128k
Error: Exception thrown by the agent : java.rmi.server.ExportException: Port 
already in use: 7199; nested exception is: 
java.net.BindException: Address already in use


  was:
Cassandra cannot restart too quickly. 
# /etc/init.d/cassandra status
cassandra (pid  6843) is running...
# /etc/init.d/cassandra restart
Shutdown Cassandra: OK
Starting Cassandra: OK
# /etc/init.d/cassandra status
cassandra is stopped

cassandra log:

xss =  -ea -javaagent:/usr/share/cassandra//lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1024M -Xmx1024M 
-Xmn200M -XX:+HeapDumpOnOutOfMemoryError -Xss128k
Error: Exception thrown by the agent : java.rmi.server.ExportException: Port 
already in use: 7199; nested exception is: 
java.net.BindException: Address already in use



 Restarting service
 --

 Key: CASSANDRA-4483
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4483
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.2
 Environment: CentOS release 6.2 (Final)
Reporter: Vladimir Barinov

 Cassandra can't restart too quickly.
 # /etc/init.d/cassandra status
 cassandra (pid  6843) is running...
 # /etc/init.d/cassandra restart
 Shutdown Cassandra: OK
 Starting Cassandra: OK
 # /etc/init.d/cassandra status
 cassandra is stopped
 cassandra log:
 xss =  -ea -javaagent:/usr/share/cassandra//lib/jamm-0.2.5.jar 
 -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1024M -Xmx1024M 
 -Xmn200M -XX:+HeapDumpOnOutOfMemoryError -Xss128k
 Error: Exception thrown by the agent : java.rmi.server.ExportException: Port 
 already in use: 7199; nested exception is: 
 java.net.BindException: Address already in use

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4483) Restarting service


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427414#comment-13427414
 ] 

Jonathan Ellis commented on CASSANDRA-4483:
---

This is because JMX doesn't set SO_REUSEADDR.  We'd have to create our own 
socket factory to work around this (http://vafer.org/blog/20061010091658/).

 Restarting service
 --

 Key: CASSANDRA-4483
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4483
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Tools
 Environment: CentOS release 6.2 (Final)
Reporter: Vladimir Barinov

 Cassandra can't restart too quickly.
 # /etc/init.d/cassandra status
 cassandra (pid  6843) is running...
 # /etc/init.d/cassandra restart
 Shutdown Cassandra: OK
 Starting Cassandra: OK
 # /etc/init.d/cassandra status
 cassandra is stopped
 cassandra log:
 xss =  -ea -javaagent:/usr/share/cassandra//lib/jamm-0.2.5.jar 
 -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1024M -Xmx1024M 
 -Xmn200M -XX:+HeapDumpOnOutOfMemoryError -Xss128k
 Error: Exception thrown by the agent : java.rmi.server.ExportException: Port 
 already in use: 7199; nested exception is: 
 java.net.BindException: Address already in use

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4483) Restarting service


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-4483:
--

  Component/s: Tools
 Priority: Trivial  (was: Major)
Affects Version/s: (was: 1.1.2)
   Issue Type: Improvement  (was: Bug)

 Restarting service
 --

 Key: CASSANDRA-4483
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4483
 Project: Cassandra
  Issue Type: Improvement
  Components: Core, Tools
 Environment: CentOS release 6.2 (Final)
Reporter: Vladimir Barinov
Priority: Trivial

 Cassandra can't restart too quickly.
 # /etc/init.d/cassandra status
 cassandra (pid  6843) is running...
 # /etc/init.d/cassandra restart
 Shutdown Cassandra: OK
 Starting Cassandra: OK
 # /etc/init.d/cassandra status
 cassandra is stopped
 cassandra log:
 xss =  -ea -javaagent:/usr/share/cassandra//lib/jamm-0.2.5.jar 
 -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1024M -Xmx1024M 
 -Xmn200M -XX:+HeapDumpOnOutOfMemoryError -Xss128k
 Error: Exception thrown by the agent : java.rmi.server.ExportException: Port 
 already in use: 7199; nested exception is: 
 java.net.BindException: Address already in use

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4483) Restarting service


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427415#comment-13427415
 ] 

Jonathan Ellis commented on CASSANDRA-4483:
---

(if we were going to do that, we might as well add an option to listen on just 
a single interface as well, as shown in the linked article.)

 Restarting service
 --

 Key: CASSANDRA-4483
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4483
 Project: Cassandra
  Issue Type: Improvement
  Components: Core, Tools
 Environment: CentOS release 6.2 (Final)
Reporter: Vladimir Barinov
Priority: Trivial
  Labels: lhf

 Cassandra can't restart too quickly.
 # /etc/init.d/cassandra status
 cassandra (pid  6843) is running...
 # /etc/init.d/cassandra restart
 Shutdown Cassandra: OK
 Starting Cassandra: OK
 # /etc/init.d/cassandra status
 cassandra is stopped
 cassandra log:
 xss =  -ea -javaagent:/usr/share/cassandra//lib/jamm-0.2.5.jar 
 -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1024M -Xmx1024M 
 -Xmn200M -XX:+HeapDumpOnOutOfMemoryError -Xss128k
 Error: Exception thrown by the agent : java.rmi.server.ExportException: Port 
 already in use: 7199; nested exception is: 
 java.net.BindException: Address already in use

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4483) Restarting service


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-4483:
--

Labels: lhf  (was: )

 Restarting service
 --

 Key: CASSANDRA-4483
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4483
 Project: Cassandra
  Issue Type: Improvement
  Components: Core, Tools
 Environment: CentOS release 6.2 (Final)
Reporter: Vladimir Barinov
Priority: Trivial
  Labels: lhf

 Cassandra can't restart too quickly.
 # /etc/init.d/cassandra status
 cassandra (pid  6843) is running...
 # /etc/init.d/cassandra restart
 Shutdown Cassandra: OK
 Starting Cassandra: OK
 # /etc/init.d/cassandra status
 cassandra is stopped
 cassandra log:
 xss =  -ea -javaagent:/usr/share/cassandra//lib/jamm-0.2.5.jar 
 -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1024M -Xmx1024M 
 -Xmn200M -XX:+HeapDumpOnOutOfMemoryError -Xss128k
 Error: Exception thrown by the agent : java.rmi.server.ExportException: Port 
 already in use: 7199; nested exception is: 
 java.net.BindException: Address already in use

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-4484) Drain causes incorrect error messages: Stream took more than 24H to complete; skipping

2012-08-02 Thread Christopher Porter (JIRA)

Christopher Porter created CASSANDRA-4484:
-

 Summary: Drain causes incorrect error messages: Stream took more 
than 24H to complete; skipping
 Key: CASSANDRA-4484
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4484
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: Christopher Porter
Priority: Minor


After calling drain on a node, there are a bunch of incorrect error messages in 
the cassandra log file: Stream took more than 24H to complete; skipping.

The problem is in MessagingService.waitForStreaming. It is logging an error if 
ThreadPoolExecutor.awaitTermination returns true, but if a timeout happens it 
returns false. See 
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html#awaitTermination%28long,%20java.util.concurrent.TimeUnit%29

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4484) Drain causes incorrect error messages: Stream took more than 24H to complete; skipping

2012-08-02 Thread Christopher Porter (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427438#comment-13427438
 ] 

Christopher Porter commented on CASSANDRA-4484:
---

Fixed with this pull request: https://github.com/apache/cassandra/pull/12

 Drain causes incorrect error messages: Stream took more than 24H to 
 complete; skipping
 

 Key: CASSANDRA-4484
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4484
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: Christopher Porter
Priority: Minor

 After calling drain on a node, there are a bunch of incorrect error messages 
 in the cassandra log file: Stream took more than 24H to complete; skipping.
 The problem is in MessagingService.waitForStreaming. It is logging an error 
 if ThreadPoolExecutor.awaitTermination returns true, but if a timeout happens 
 it returns false. See 
 http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html#awaitTermination%28long,%20java.util.concurrent.TimeUnit%29

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4483) Restarting service

2012-08-02 Thread Yuki Morishita (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427446#comment-13427446
 ] 

Yuki Morishita commented on CASSANDRA-4483:
---

There was discussion about JMX binding address on CASSANDRA-2967.

 Restarting service
 --

 Key: CASSANDRA-4483
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4483
 Project: Cassandra
  Issue Type: Improvement
  Components: Core, Tools
 Environment: CentOS release 6.2 (Final)
Reporter: Vladimir Barinov
Priority: Trivial
  Labels: lhf

 Cassandra can't restart too quickly.
 # /etc/init.d/cassandra status
 cassandra (pid  6843) is running...
 # /etc/init.d/cassandra restart
 Shutdown Cassandra: OK
 Starting Cassandra: OK
 # /etc/init.d/cassandra status
 cassandra is stopped
 cassandra log:
 xss =  -ea -javaagent:/usr/share/cassandra//lib/jamm-0.2.5.jar 
 -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1024M -Xmx1024M 
 -Xmn200M -XX:+HeapDumpOnOutOfMemoryError -Xss128k
 Error: Exception thrown by the agent : java.rmi.server.ExportException: Port 
 already in use: 7199; nested exception is: 
 java.net.BindException: Address already in use

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-4485) cqlsh: support collections

Sylvain Lebresne created CASSANDRA-4485:
---

 Summary: cqlsh: support collections
 Key: CASSANDRA-4485
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4485
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: paul cannon
Priority: Minor
 Fix For: 1.2




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

2012-08-02 Thread Jeremy Hanna (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427448#comment-13427448
 ] 

Jeremy Hanna commented on CASSANDRA-3974:
-

bq. If we're just going to have CF TTL being sugar for clients too lazy to 
apply what they want, then I'm not interested.

Also if that client happens to be Pig or Hive, there's not currently a way to 
set TTLs.  So in that case it's not laziness of the client.

A use case: I don't want to MapReduce over my giant archival column family so 
when ingesting data, I'll write to my archival column family and in addition a 
column family with a default TTL or however it's implemented, so it would just 
be data from the last 30 days.

 Per-CF TTL
 --

 Key: CASSANDRA-3974
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
 Project: Cassandra
  Issue Type: New Feature
Affects Versions: 1.2
Reporter: Jonathan Ellis
Assignee: Kirk True
Priority: Minor
 Fix For: 1.2

 Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, 
 trunk-3974v4.txt


 Per-CF TTL would allow compaction optimizations (drop an entire sstable's 
 worth of expired data) that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4482) In-memory merkle trees for repair

2012-08-02 Thread Mike Bulman (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427489#comment-13427489
 ] 

Mike Bulman commented on CASSANDRA-4482:


fwiw, continuous repair is on the roadmap for DataStax OpsCenter, so taking 
advantage of incremental repair will be extremely simple from an end user 
standpoint.

 In-memory merkle trees for repair
 -

 Key: CASSANDRA-4482
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4482
 Project: Cassandra
  Issue Type: New Feature
Reporter: Marcus Eriksson

 this sounds cool, we should reimplement it in the open source cassandra;
 http://www.acunu.com/2/post/2012/07/incremental-repair.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4484) Drain causes incorrect error messages: Stream took more than 24H to complete; skipping


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-4484:
--

Affects Version/s: (was: 1.1.1)
   1.1.0
Fix Version/s: 1.1.4
 Assignee: Jonathan Ellis

(Introduced in CASSANDRA-3679 for 1.1.0.)

 Drain causes incorrect error messages: Stream took more than 24H to 
 complete; skipping
 

 Key: CASSANDRA-4484
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4484
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Christopher Porter
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 1.1.4


 After calling drain on a node, there are a bunch of incorrect error messages 
 in the cassandra log file: Stream took more than 24H to complete; skipping.
 The problem is in MessagingService.waitForStreaming. It is logging an error 
 if ThreadPoolExecutor.awaitTermination returns true, but if a timeout happens 
 it returns false. See 
 http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html#awaitTermination%28long,%20java.util.concurrent.TimeUnit%29

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (CASSANDRA-4484) Drain causes incorrect error messages: Stream took more than 24H to complete; skipping


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis resolved CASSANDRA-4484.
---

Resolution: Fixed
  Reviewer: jbellis
  Assignee: Christopher Porter  (was: Jonathan Ellis)

patch lgtm, committed.  thanks!

 Drain causes incorrect error messages: Stream took more than 24H to 
 complete; skipping
 

 Key: CASSANDRA-4484
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4484
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Christopher Porter
Assignee: Christopher Porter
Priority: Minor
 Fix For: 1.1.4


 After calling drain on a node, there are a bunch of incorrect error messages 
 in the cassandra log file: Stream took more than 24H to complete; skipping.
 The problem is in MessagingService.waitForStreaming. It is logging an error 
 if ThreadPoolExecutor.awaitTermination returns true, but if a timeout happens 
 it returns false. See 
 http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html#awaitTermination%28long,%20java.util.concurrent.TimeUnit%29

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[Cassandra Wiki] Update of VirtualNodes/Balance by EricEvans

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The VirtualNodes/Balance page has been changed by EricEvans:
http://wiki.apache.org/cassandra/VirtualNodes/Balance?action=diffrev1=5rev2=6

Comment:
shuffles as long running tasks

  
   * Shuffling node at a time means that for each node i for i in 0..N-1 
(where N is the cluster size), i/N of the ranges shuffled will, on average, 
have been shuffled at least once already. So it's substantially less efficient 
than shuffling once, then assigning the vnodes out in one cluster-wide pass. 
-- ''Jonathan 
Ellis''FootNote([[https://issues.apache.org/jira/browse/CASSANDRA-4443?focusedCommentId=13423505page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13423505|CASSANDRA-4443#comment-13423505]])
  
+  * Shuffling will entail moving a ''lot'' of data around the cluster and so 
has the potential to consume a lot of disk and network I/O, and to take a 
considerable amount of time.  For this to be an online operation, the shuffle 
will need to operate on a lower priority basis to other streaming operations, 
and should be expected to take days or weeks to complete.
+ 
  === Nodes / Cluster ===
  The most straightforward method of effecting ownership is a token move (i.e. 
relocating a range from one node to another).  Exposing this with JMX would 
allow implementing all of the required operations client-side.

[Cassandra Wiki] Update of VirtualNodes/Balance by JonathanEllis

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The VirtualNodes/Balance page has been changed by JonathanEllis:
http://wiki.apache.org/cassandra/VirtualNodes/Balance?action=diffrev1=6rev2=7

  
   * Shuffling will entail moving a ''lot'' of data around the cluster and so 
has the potential to consume a lot of disk and network I/O, and to take a 
considerable amount of time.  For this to be an online operation, the shuffle 
will need to operate on a lower priority basis to other streaming operations, 
and should be expected to take days or weeks to complete.
  
+  * Corollary: shuffling should tell the operator what vnodes it plans to move 
where, and report progress whenever one completes successfully.  This will 
allow recovering from an interrupted shuffle, if necessary.
+ 
+  * Shuffling can be sped up by parallelizing such that each node has one 
vnode moving to or from it at a time.  With appropriate stream throttling this 
should be better than just one vnode at a time cluster-wide.
+ 
  === Nodes / Cluster ===
  The most straightforward method of effecting ownership is a token move (i.e. 
relocating a range from one node to another).  Exposing this with JMX would 
allow implementing all of the required operations client-side.

[jira] [Commented] (CASSANDRA-3680) Add Support for Composite Secondary Indexes

2012-08-02 Thread Yuki Morishita (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427605#comment-13427605
 ] 

Yuki Morishita commented on CASSANDRA-3680:
---

I ran couple of test with above _blogs_ CF(index is created on _author_).

First, insert 3 rows below:
{code}
cqlsh:3680 INSERT INTO blogs (blog_id, posted_at, author, content) VALUES (1, 
'2012-11-11', 'foo', 'bar');
cqlsh:3680 INSERT INTO blogs (blog_id, posted_at, author, content) VALUES (2, 
'2012-11-12', 'foo', 'baz');
cqlsh:3680 INSERT INTO blogs (blog_id, posted_at, author, content) VALUES (3, 
'2012-11-11', 'gux', 'quux');
cqlsh:3680 SELECT * FROM blogs;
 blog_id | posted_at| author | content
-+--++-
   1 | 2012-11-11 00:00:00-0600 |foo | bar
   2 | 2012-11-12 00:00:00-0600 |foo | baz
   3 | 2012-11-11 00:00:00-0600 |gux |quux
{code}

Select on indexed column works fine:
{code}
cqlsh:3680 SELECT * FROM blogs WHERE author='foo';
 blog_id | posted_at| author | content
-+--++-
   1 | 2012-11-11 00:00:00-0600 |foo | bar
   2 | 2012-11-12 00:00:00-0600 |foo | baz
{code}

But, query combined with primary key(2nd query below) is not working as 
expected:
{code}
cqlsh:3680 SELECT * FROM blogs WHERE posted_at='2012-11-11';
 blog_id | posted_at| author | content
-+--++-
   1 | 2012-11-11 00:00:00-0600 |foo | bar
   3 | 2012-11-11 00:00:00-0600 |gux |quux

cqlsh:3680 SELECT * FROM blogs WHERE posted_at='2012-11-11' AND author='foo';
 blog_id | posted_at| author | content
-+--++-
   1 | 2012-11-11 00:00:00-0600 |foo | bar
   2 | 2012-11-12 00:00:00-0600 |foo | baz
{code}
Here, I expected only row with blog_id=1, but both 1 and 2 are returned.

 Add Support for Composite Secondary Indexes
 ---

 Key: CASSANDRA-3680
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3680
 Project: Cassandra
  Issue Type: Sub-task
Reporter: T Jake Luciani
Assignee: Sylvain Lebresne
  Labels: cql3, secondary_index
 Fix For: 1.2

 Attachments: 0001-Secondary-indexes-on-composite-columns.txt


 CASSANDRA-2474 and CASSANDRA-3647 add the ability to transpose wide rows 
 differently, for efficiency and functionality secondary index api needs to be 
 altered to allow composite indexes.  
 I think this will require the IndexManager api to have a 
 maybeIndex(ByteBuffer column) method that SS can call and implement a 
 PerRowSecondaryIndex per column, break the composite into parts and index 
 specific bits, also including the base rowkey.
 Then a search against a TRANSPOSED row or DOCUMENT will be possible.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4481) Commitlog not replayed after restart - data lost