[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2012-10-05 Thread Chris Chiappone (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470655#comment-13470655
 ] 

Chris Chiappone commented on CASSANDRA-2749:


This actually causes a bunch of problems for customers that have used longer 
the 48 character keyspace and cf names. How will this affect the upgrade path 
for those customers?

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 1.1.0

 Attachments: 0001-2749.patch, 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 
 0002-fix-unit-tests.patch, 0003-Fixes.patch, 
 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 
 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 
 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 
 2749_proper.tar.gz, 2749.tar.gz


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2012-04-02 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244746#comment-13244746
 ] 

Jonathan Ellis commented on CASSANDRA-2749:
---

Did some more research on path limitations:

NTFS is technically okay with paths up to 32K long[1], but the windows api is 
limited to 256[2].  Common Linux filesystems have a limit of 255 bytes per path 
*component* (i.e. directory or filename) but no total path limit.  However, 
Linux defines PATH_MAX and FILENAME_MAX, both 4096. [3]

[1] http://en.wikipedia.org/wiki/Comparison_of_file_systems
[2] http://msdn.microsoft.com/en-us/library/aa365247.aspx
[3] http://serverfault.com/questions/9546/filename-length-limits-on-linux

In short: restricting KS and CF names to 32 characters is a good idea for the 
benefit of Windows portability.  However, we may want to exempt Linux systems 
from the startup length check to allow easier upgrades.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 1.1.0

 Attachments: 0001-2749.patch, 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 
 0002-fix-unit-tests.patch, 0003-Fixes.patch, 2749.tar.gz, 
 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 
 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 
 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 
 2749_proper.tar.gz


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2012-01-03 Thread Pavel Yaskevich (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178822#comment-13178822
 ] 

Pavel Yaskevich commented on CASSANDRA-2749:


+1 with last nit - Directories.SSTableLister methods 
skip{Compacted,Temporary}(boolean) and includeBackups(boolean) ignore argument 
and always set corresponding option to true.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 0001-2749.patch, 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 
 0002-fix-unit-tests.patch, 0003-Fixes.patch, 2749.tar.gz, 
 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 
 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 
 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 
 2749_proper.tar.gz


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-12-23 Thread Pavel Yaskevich (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175390#comment-13175390
 ] 

Pavel Yaskevich commented on CASSANDRA-2749:


bq. I'm not sure I see what this one is. Are we talking of the migration 
process?

I was testing it like this : 

# run 1.1 *without* modifications
# ./tools/stress/bin/stress -n 5 -S 512 -x KEYS
# ./bin/nodetool -h localhost flush Keyspace1 Standard1
# ./bin/nodetool -h localhost snapshot Keyspace1
# made sure that Standard1.Idx-* SSTables are in the snapshots/timestamp 
directory
# run 1.1 *with* you patch applied
# checked if snapshots directory was moved and what files did it include - 
it was lucking Standard1.Idx-* files
# cleaned data directory
# repeated steps 1 - 5 but this time *with* your patch applied and it 
didn't include Standard1.Idx-* into snapshot

bq. Maybe it is more natural to have secondary indexes sstables be in the 
same directory than the base cfs? 

+1


 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 0001-2749.patch, 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 
 0002-fix-unit-tests.patch, 0003-Fixes.patch, 2749.tar.gz, 
 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 
 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 
 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 
 2749_proper.tar.gz


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-12-23 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175447#comment-13175447
 ] 

Sylvain Lebresne commented on CASSANDRA-2749:
-

Weird. I just tried the same scenario and everything worked correctly. I should 
mention that when moving the snapshots/backups, the migration process rename 
them to the new filename convention, so they will be called 
Keyspace1-Standard1.Idx-*. Or maybe I fixed it with the last version of the 
patch without realizing it.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 0001-2749.patch, 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 
 0002-fix-unit-tests.patch, 0003-Fixes.patch, 2749.tar.gz, 
 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 
 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 
 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 
 2749_proper.tar.gz


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-12-23 Thread Pavel Yaskevich (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175460#comment-13175460
 ] 

Pavel Yaskevich commented on CASSANDRA-2749:


That my be the case :) I will re-test as part of the review anyway.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 0001-2749.patch, 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 
 0002-fix-unit-tests.patch, 0003-Fixes.patch, 2749.tar.gz, 
 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 
 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 
 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 
 2749_proper.tar.gz


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-12-22 Thread Pavel Yaskevich (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175070#comment-13175070
 ] 

Pavel Yaskevich commented on CASSANDRA-2749:


Thanks, Sylvain! We are getting really close :)

Here are problems I found:

- o.a.c.db.Directories comment should be updated because it still uses SSTable 
file name without keyspace.
- o.a.c.io.sstable.SSTableReaderTest won't compile
- if you start with empty data directory you get following exception and 
process exits
{code}
 INFO 23:51:11,987 Upgrade from pre-1.1 version detected: migrating sstables to 
new directory layout
ERROR 23:51:11,988 Exception encountered during startup
java.lang.NullPointerException
at 
org.apache.cassandra.db.Directories.migrateSSTables(Directories.java:416)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:164)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:360)
at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)
java.lang.NullPointerException
at 
org.apache.cassandra.db.Directories.migrateSSTables(Directories.java:416)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:164)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:360)
at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)
Exception encountered during startup: null
{code}

- on snapshot doesn't create or move (from older schema) index SSTables related 
to CF
- shouldn't old snapshots directory be removed after move? 


 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 0001-2749-v2.patch, 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 0001-add-new-directory-layout.patch, 
 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 
 0002-fix-unit-tests-v2.patch, 0002-fix-unit-tests.patch, 2749.tar.gz, 
 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 
 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 
 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 
 2749_proper.tar.gz


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-12-15 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170364#comment-13170364
 ] 

Jonathan Ellis commented on CASSANDRA-2749:
---

bq. we cannot stream between two nodes, one using separate cf directory

I don't see any reason to continue to support the old-style directory layout.  
That adds complexity (operationally as well as in the code) for no benefit that 
I can think of.  I think we should migrate from old layout to new on the first 
startup under 1.1.

bq. regarding keyspaces in file names, sure, why not, guess having a header 
with this info in the file is out of the question, then the only meta data we 
have is the file name, right? A problem could be if we want to do 
CASSANDRA-1983 later, that would increase the file name length even more

I'm on the fence here -- on the one hand having ks + cf in the filename 
simplifies some things.  On the other hand, we allow arbitrary-length KS + CF 
names (up to 64K iirc) so UUID aside we're already in trouble on ext3/ext4, 
xfs, and ntfs, which all support max filename length of ~256.  I'm starting to 
think we should move these into the metadata component instead of the filename.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 0001-add-new-directory-layout.patch, 
 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 
 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 
 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 
 2749_backwards_compatible_v4.patch, 
 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 
 2749_proper.tar.gz


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-12-15 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170385#comment-13170385
 ] 

Sylvain Lebresne commented on CASSANDRA-2749:
-

{quote}
On the other hand, we allow arbitrary-length KS + CF names (up to 64K iirc) so 
UUID aside we're already in trouble on ext3/ext4, xfs, and ntfs, which all 
support max filename length of ~256. I'm starting to think we should move these 
into the metadata component instead of the filename.
{quote}

The thing with the metadata component is that from a code perspective, there is 
lots of places where we want to create a Descriptor, which involves extracting 
the keyspace/cf names only based on the filename. Adding the necessity to 
locate and read the metadata in those places will likely don't be very fun.

So I'd be in favor of just limiting the keyspace and column family names. It's 
one for which there is no real point to have very long names. Limiting each one 
to 32 characters shouldn't be a strong limitation.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 0001-add-new-directory-layout.patch, 
 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 
 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 
 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 
 2749_backwards_compatible_v4.patch, 
 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 
 2749_proper.tar.gz


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-12-15 Thread Marcus Eriksson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170467#comment-13170467
 ] 

Marcus Eriksson commented on CASSANDRA-2749:


sounds great (both just supporting new-style-layout and limiting names to 
32chars)

guess we need to supply a tool to rename sstables files if anyone is on longer 
names? and rolling upgrades are out of the question then right? (maybe the 
already are?)

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 0001-add-new-directory-layout.patch, 
 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 
 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 
 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 
 2749_backwards_compatible_v4.patch, 
 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 
 2749_proper.tar.gz


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-12-15 Thread Pavel Yaskevich (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170475#comment-13170475
 ] 

Pavel Yaskevich commented on CASSANDRA-2749:


I think if we would be just supporting new style layout we can convert on 
startup. +1 on both ideas tho.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 0001-add-new-directory-layout.patch, 
 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 
 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 
 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 
 2749_backwards_compatible_v4.patch, 
 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 
 2749_proper.tar.gz


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-12-15 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170480#comment-13170480
 ] 

Sylvain Lebresne commented on CASSANDRA-2749:
-

bq. guess we need to supply a tool to rename sstables files if anyone is on 
longer names?

We probably don't need to do anything. I don't think anyone is really using 
names long enough for them to it the file system limit, the goal of limiting 
the names is just so to prevent this from happening but there will be no other 
assumption that the names are short from the code.

I also don't think anything will prevent rolling upgrades, do you had something 
in mind?

Note: I have a long flight ahead of me so I plan to update my last patch with 
both those changes, as I still like the moving of all the directories handling 
in a dedicated class, even if we don't support both layout.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 0001-add-new-directory-layout.patch, 
 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 
 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 
 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 
 2749_backwards_compatible_v4.patch, 
 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 
 2749_proper.tar.gz


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-12-15 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170487#comment-13170487
 ] 

Jonathan Ellis commented on CASSANDRA-2749:
---

It might be worth adding a are my filenames going to be too large check 
against all KS + CF combinations before starting to migrate data files around, 
though.  It would suck to end up with a partially converted database if some 
short CF names complete early on, before erroring out on a long one.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 0001-add-new-directory-layout.patch, 
 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 
 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 
 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 
 2749_backwards_compatible_v4.patch, 
 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 
 2749_proper.tar.gz


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-12-06 Thread Marcus Eriksson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163570#comment-13163570
 ] 

Marcus Eriksson commented on CASSANDRA-2749:


agree that the file structure is all over the place, that's why this patch was 
so incredibly painful to do

seems i only did the disk space checking in sub directories in the backwards 
compatible patch

i had on algorithm for detecting what kind of file it is in the old backwards 
compatible patch, it iterates over all data directories and figures out which 
data directory the file is in, then it knows the keyspace is the next part of 
the filename, and can check if there are files or directories in that directory.

but i agree, it is incredibly ugly, your modifications make it alot better, and 
yes, this would be a good time to do this properly, not like me just trying to 
do a minimal patch that works.

regarding keyspaces in file names, sure, why not, guess having a header with 
this info in the file is out of the question, then the only meta data we have 
is the file name, right? A problem could be if we want to do CASSANDRA-1983 
later, that would increase the file name length even more.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 0001-add-new-directory-layout.patch, 
 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 
 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 
 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 
 2749_backwards_compatible_v4.patch, 
 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 
 2749_proper.tar.gz


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-12-06 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163571#comment-13163571
 ] 

Sylvain Lebresne commented on CASSANDRA-2749:
-

bq. i had on algorithm for detecting what kind of file it is in the old 
backwards compatible patch, it iterates over all data directories and figures 
out which data directory the file is in, then it knows the keyspace is the next 
part of the filename, and can check if there are files or directories in that 
directory.

I tried that too, but this doesn't work when say you're opening a sstable by 
the sstableloader, because the files are not in a data directories then. 

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 0001-add-new-directory-layout.patch, 
 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 
 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 
 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 
 2749_backwards_compatible_v4.patch, 
 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 
 2749_proper.tar.gz


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-11-22 Thread Pavel Yaskevich (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155218#comment-13155218
 ] 

Pavel Yaskevich commented on CASSANDRA-2749:


+1 first few additions:
 - You forgot to set `chmod +x ./bin/sstablemover`.
 - Good idea would be to use `mv` instead of `cp` when -d option is specified.


 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 
 2749.tar.gz, 2749_backwards_compatible_v1.patch, 
 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 
 2749_backwards_compatible_v4.patch, 
 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-11-19 Thread Pavel Yaskevich (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153620#comment-13153620
 ] 

Pavel Yaskevich commented on CASSANDRA-2749:


Thanks a lot for taking care about this, Marcus!

I have tested your latest patch using stress and sstablemover and then stress 
(+ flush/scrub/snapshot) again for Standard (+ secondary indexes) and Super 
ColumnFamilies and everything worked just fine for me. Here are my last 
comments and I think after this gets done we are ready to push your changes:

1). a). Tool should be made more user-friendly by adding at least --help and 
add an option to move sstables back to the keyspace directory (as we have an 
two states in the config);
b). I think it should delete old sstables after copy automatically or by 
asking user using --delete-old option?
c). Move sstablemover tool the ./bin instead of ./tools/sstablemover.

2). LegacySSTableTest fails on my machine can you, please, check that?

3). I think we should extend comment in the config describing what should be 
done before it is safe to change to change the option e.g. Note: before 
setting this option to 'true' you should move existing SSTable files to the 
separate directories manually by using ./bin/sstablemover tool, the same 
applies for 'false' state - ./bin/sstablemover could be used to restore 
original directory structure (Please use ./bin/sstablemover --help for help and 
information)  
 
Also please remove/implement remaining // TODO: comments.


 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 
 2749.tar.gz, 2749_backwards_compatible_v1.patch, 
 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 
 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-11-08 Thread Pavel Yaskevich (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13146556#comment-13146556
 ] 

Pavel Yaskevich commented on CASSANDRA-2749:


I don't really like the idea of keeping a map of all generates for each of 
ks/cf and updating it all the time (this also implies that we need to resize it 
after cf is created/dropped). After re-thinking this once again it feels like 
we really should just make a tool to transfer SSTable to separate directories 
and back, make a flag in config that will indicate old/new style of file 
structure, because I don't see a error-prone way to handle this when old/new 
style SSTables are mixed...

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Marcus Eriksson
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 2749.tar.gz, 
 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 
 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 
 2749_backwards_compatible_v4_rebase1.patch


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-11-04 Thread Pavel Yaskevich (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13143863#comment-13143863
 ] 

Pavel Yaskevich commented on CASSANDRA-2749:


v3 looks good (needs rebase), here is my comments:

 - DatabaseDescriptor.getPerCFDirectory() should be renamed to something like 
useSeparateCFDirectories
 - in the ColumnFamilyStory.accept method file name should be checked depending 
on case because we can CF names are case-sensitive.
 - we don't have ColumnFamilyStore.rename method anymore which is a good thing 
for this patch, you can just rebase and remove related code.
 - in the Table.snapshotExists I think we should be more careful determining if 
snapshot actually exists.
 - TODO should be cleaned up.

Wish: if it's possible I think we can remove ColumnFamily name from the SSTable 
files if those are in the CF directory already.

I think we are almost done in here.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Marcus Eriksson
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 
 2749_backwards_compatible_v3.patch


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-10-14 Thread Marcus Eriksson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13127329#comment-13127329
 ] 

Marcus Eriksson commented on CASSANDRA-2749:


Biggest part left is figuring out how to estimate disk space to know where to 
write an sstable, ill work on that this weekend

Other than that small cleanups (codestyle), unit tests and adding the config 
parameter

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-10-13 Thread Pavel Yaskevich (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13126899#comment-13126899
 ] 

Pavel Yaskevich commented on CASSANDRA-2749:


Looks better now but I don't like how you changed DescriptorTest especially 
testLegacy() userActionUtilsKey there is meant to be a CF name, so it seems 
like current version does not work correctly?

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-10-13 Thread Marcus Eriksson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13126910#comment-13126910
 ] 

Marcus Eriksson commented on CASSANDRA-2749:


ah that is a leftover from my first non-backwards-compatible patch

it works even without the cf in the new File(...)

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-10-13 Thread Pavel Yaskevich (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13126980#comment-13126980
 ] 

Pavel Yaskevich commented on CASSANDRA-2749:


I forgot to mention - would be great to see a test for RecursiveGlob and please 
put a license banner on top of that file like we do everywhere else.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-10-10 Thread Marcus Eriksson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13124288#comment-13124288
 ] 

Marcus Eriksson commented on CASSANDRA-2749:


Anyone got a suggestion on how to do the available disk space checking? Reason 
in that now we cannot simply check which dataFileLocation that has the most 
available disk space, we need to check the subdirs where files are actually 
written.

My naive first approach would be to simply change  
DatabaseDescriptor.getDataFileLocationForTable(..) to take a column family 
name, which we ought to always know when calling that method (and a quick check 
seems to say that only compactions might prove a problem for this approach).

Comments?

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-10-10 Thread Pavel Yaskevich (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13124330#comment-13124330
 ] 

Pavel Yaskevich commented on CASSANDRA-2749:


Sorry Marcus I had no time to look at this but I will by the end of this week.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-10-05 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121019#comment-13121019
 ] 

Jonathan Ellis commented on CASSANDRA-2749:
---

Yes, trunk is the right place for this.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-10-01 Thread Marcus Eriksson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13118696#comment-13118696
 ] 

Marcus Eriksson commented on CASSANDRA-2749:


It is not backwards compatible, i figured it was not worth the extra 
complexity, i imagine the upgrade-path to be:
1. shut down node
2. edit config file
3. symlink in SSDs and move the files into the subdirectories
4. start node

Of course, this could be done on startup by cassandra itself, but providing a 
shell-script that does it keeps it simple, what do you think? I really dont 
want the complexity with sstable files in two places at the same time.

My solution simply changed the Descriptor class to return paths to sstable 
files in subdirs, and then fix everything that was affected. Many of the 
classes modified were due to assumptions in the unit tests. I'm currently 
refactoring the Descriptor class to make it cleaner and a central place to glob 
for files etc. Expect a patch tonight or tomorrow.

I will of course change DatabaseDescriptor.getPerCFDirectory() to read the 
config file, kept it hard coded to make testing simpler.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-10-01 Thread Pavel Yaskevich (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13118720#comment-13118720
 ] 

Pavel Yaskevich commented on CASSANDRA-2749:


This should be done like compression which compresses only newly created 
SSTables, we don't want to make users to stop their node for indefinite time to 
change config and move files.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-10-01 Thread Marcus Eriksson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13118859#comment-13118859
 ] 

Marcus Eriksson commented on CASSANDRA-2749:


Ok, i'll update the patch

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-10-01 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13118864#comment-13118864
 ] 

Jonathan Ellis commented on CASSANDRA-2749:
---

I don't think taking compression as a model makes sense. With compression, it's 
fine to set the option globally and each node can start using compression when 
it gets the schema update. But this affects the storage engine below schema.

I think it's reasonable to make the change offline, especially if an upgrade 
tool is provided to make it simple.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-10-01 Thread Pavel Yaskevich (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13118868#comment-13118868
 ] 

Pavel Yaskevich commented on CASSANDRA-2749:


On my opinion it's more user-friendly if we support old directory structure for 
existing files and let compaction do replace for us.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-10-01 Thread Marcus Eriksson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13118867#comment-13118867
 ] 

Marcus Eriksson commented on CASSANDRA-2749:


agree, gonna look at how much pain it is to actually build though

wont need alot of downtime if it is done offline, can rsync files into the 
subdir right before stopping the node



 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-09-30 Thread Marcus Eriksson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13118017#comment-13118017
 ] 

Marcus Eriksson commented on CASSANDRA-2749:


btw, yes, i did include the data files in the patch on purpose, they need to be 
in subdirs for some of the unit tests to work

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-09-30 Thread Pavel Yaskevich (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13118454#comment-13118454
 ] 

Pavel Yaskevich commented on CASSANDRA-2749:


Thanks for your work, Marcus! First of all, can you please write your algorithm 
down so people can participate without reading actual code?

I don't think that this is the right way to go because I can't find how does it 
manage to keep backward compatibility with current directory structure (and 
that test SSTables should be moved is yet another confirmation) that would 
imply major change in the Descriptor/Component classes, all the current changes 
relay on DatabaseDescriptor.getPerCFDirectory() which is static true which 
makes it redundant. In my vision of things major changes should be made only in 
Descriptor, Component, ColumnFamilyStore and SSTable classes to change the way 
they create/lookup file locations and do backups and snapshoting.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-08-16 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085812#comment-13085812
 ] 

Pavel Yaskevich commented on CASSANDRA-2749:


After crushing my head again this for a few days I can say that this is more 
complicated that it sounds for a few reasons:

 - We will need to support both old/new directory structures which requires 
major changes in the way how Descriptor class works and how CFS and SSTable 
classes do file lookup and path generation.
 - Adds additional complexity to the way how we do backups, snapshots and 
recover which could potentially lead to some nasty bugs.
 - As Peter already mentioned Cassandra won't be able to distinguish between 
an actual empty CF and a directory that wasn't mounted (or a symlink pointing 
to a non-mounted directory).

There are more but those mentioned are major ones. Let's skip this for now.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
Priority: Minor

 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-08-09 Thread Chris Burroughs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081679#comment-13081679
 ] 

Chris Burroughs commented on CASSANDRA-2749:


It would also be cool (but this is obviously speculative) to have the ability 
to keep Index files on an SSD, and the larger data files on rotating disks.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 1.0


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-06-08 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045953#comment-13045953
 ] 

Jonathan Ellis commented on CASSANDRA-2749:
---

We could also have a memory location that would be useful for temporary data.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor

 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-06-08 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045956#comment-13045956
 ] 

Jonathan Ellis commented on CASSANDRA-2749:
---

There's some tension between managing this cluster-wide and the actual data 
directory definitions being per-machine. Not sure what the best solution there 
is.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor

 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-06-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046045#comment-13046045
 ] 

Héctor Izquierdo commented on CASSANDRA-2749:
-

What about being configurable in a separate file like the network topology? 
Could that work as a first approximation?

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor

 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-06-08 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046069#comment-13046069
 ] 

Ryan King commented on CASSANDRA-2749:
--

Since each keyspace is stored in a different sub-directory of the 
DataDiretories, you can already split the storage of different keyspaces with 
some clever mount options. Maybe we could give column families the same 
treatment?

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor

 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-06-08 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046093#comment-13046093
 ] 

Jonathan Ellis commented on CASSANDRA-2749:
---

Ryan's idea sounds like the simplest way to get something good enough to me.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor

 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-06-08 Thread Peter Schuller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046102#comment-13046102
 ] 

Peter Schuller commented on CASSANDRA-2749:
---

+1 on that. We have been discussing the same thing, for the same purpose. The 
only kink is that you don't want to do something like having a per-cf setting 
that is tied to local node details like paths. But simply placing CF:s in a 
named subdirectory (similar to the pg tablespace) which can, on a per-node 
basis, by a symlink or a mountpoint, avoids that.

This means there's no problem doing a rolling re-configuration of a cluster, 
and there is no need to realize before hand that you might want to move some 
particular CF and do something like assign it to a tablespace (to get the level 
of indirection). It all just works by default, and you can move CF:s at any 
time on any node without co-ordination other than the node being down for a bit.

I can foresee it being easier to accidentally start a node which seems to work 
but has some CF:s be completely empty, because Cassandra won't be able to 
distinguish between an actual empty CF and a directory that wasn't mounted (or 
a symlink pointing to a non-mounted directory). Something simple like creating 
a marker of some kind on CF creation might help with that; on start-up CF:s 
that are missing the marker could be rejected. But - I suppose this is overkill 
at least initially.


 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor

 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira