[jira] [Commented] (CASSANDRA-9387) Add snitch supporting Windows Azure
[ https://issues.apache.org/jira/browse/CASSANDRA-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970263#comment-14970263 ] Matt Kennedy commented on CASSANDRA-9387: - This item is waiting on the fault-domain and update-domain data to be exposed via the Instance Metadata Service similar to how the instance event metadata is exposed in this article: https://azure.microsoft.com/en-us/blog/what-just-happened-to-my-vm-in-vm-metadata-service/ > Add snitch supporting Windows Azure > --- > > Key: CASSANDRA-9387 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9387 > Project: Cassandra > Issue Type: New Feature > Components: Config >Reporter: Jonathan Ellis >Assignee: Matt Kennedy > Fix For: 2.1.x > > > Looks like regions / fault domains are a pretty close analogue to C* > DCs/racks. > http://blogs.technet.com/b/yungchou/archive/2011/05/16/window-azure-fault-domain-and-update-domain-explained-for-it-pros.aspx -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9048) Delimited File Bulk Loader
[ https://issues.apache.org/jira/browse/CASSANDRA-9048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391221#comment-14391221 ] Matt Kennedy commented on CASSANDRA-9048: - I'd like to advocate for this loader being part of core Cassandra. Bulk loading is a fundamental task for any database. And database operators need multiple strategies to address the task. Not only does this tool meet the need in the most efficient way that has been identified so far, but it also serves as sample code for users to customize to build their own efficient loaders. It isn't really as practical for end-users to try to learn how to do customized bulk loading the right way by examining the COPY operation. This tool is at least as useful and any code in the examples directory and applies to a broader set of Cassandra users. Since it happens to be a fully functioning tool though, it seems to make more sense for it to live under the tools directory. Delimited File Bulk Loader -- Key: CASSANDRA-9048 URL: https://issues.apache.org/jira/browse/CASSANDRA-9048 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Brian Hess Attachments: CASSANDRA-9048.patch There is a strong need for bulk loading data from delimited files into Cassandra. Starting with delimited files means that the data is not currently in the SSTable format, and therefore cannot immediately leverage Cassandra's bulk loading tool, sstableloader, directly. A tool supporting delimited files much closer matches the format of the data more often than the SSTable format itself, and a tool that loads from delimited files is very useful. In order for this bulk loader to be more generally useful to customers, it should handle a number of options at a minimum: - support specifying the input file or to read the data from stdin (so other command-line programs can pipe into the loader) - supply the CQL schema for the input data - support all data types other than collections (collections is a stretch goal/need) - an option to specify the delimiter - an option to specify comma as the decimal delimiter (for international use casese) - an option to specify how NULL values are specified in the file (e.g., the empty string or the string NULL) - an option to specify how BOOLEAN values are specified in the file (e.g., TRUE/FALSE or 0/1) - an option to specify the Date and Time format - an option to skip some number of rows at the beginning of the file - an option to only read in some number of rows from the file - an option to indicate how many parse errors to tolerate - an option to specify a file that will contain all the lines that did not parse correctly (up to the maximum number of parse errors) - an option to specify the CQL port to connect to (with 9042 as the default). Additional options would be useful, but this set of options/features is a start. A word on COPY. COPY comes via CQLSH which requires the client to be the same version as the server (e.g., 2.0 CQLSH does not work with 2.1 Cassandra, etc). This tool should be able to connect to any version of Cassandra (within reason). For example, it should be able to handle 2.0.x and 2.1.x. Moreover, CQLSH's COPY command does not support a number of the options above. Lastly, the performance of COPY in 2.0.x is not high enough to be considered a bulk ingest tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8489) Restrict table visibility across keyspaces
Matt Kennedy created CASSANDRA-8489: --- Summary: Restrict table visibility across keyspaces Key: CASSANDRA-8489 URL: https://issues.apache.org/jira/browse/CASSANDRA-8489 Project: Cassandra Issue Type: Improvement Reporter: Matt Kennedy Priority: Minor This ticket is to capture a specific fine grained authorization request, specifically that users should be able to be restricted to only seeing tables in specific keyspaces. For example, given keyspaces K1 and K2 and users U1 U2, allow U1 access to K1, but no access to even see table names in K2. Allow U2 access to K2, but no access to see table names in K1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7631) Allow Stress to write directly to SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077730#comment-14077730 ] Matt Kennedy commented on CASSANDRA-7631: - Personally I think the current syntax is a massive improvement over the older version. It takes a little bit of time to work out, and a small handful of the options remain confusing, but overall it's a fairly clear system with useful help messages. If anything, some examples of different invocations (incantations?) would be useful, but I don't see a reason to massively change it. Allow Stress to write directly to SSTables -- Key: CASSANDRA-7631 URL: https://issues.apache.org/jira/browse/CASSANDRA-7631 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Russell Alexander Spitzer Assignee: Russell Alexander Spitzer One common difficulty with benchmarking machines is the amount of time it takes to initially load data. For machines with a large amount of ram this becomes especially onerous because a very large amount of data needs to be placed on the machine before page-cache can be circumvented. To remedy this I suggest we add a top level flag to Cassandra-Stress which would cause the tool to write directly to sstables rather than actually performing CQL inserts. Internally this would use CQLSStable writer to write directly to sstables while skipping any keys which are not owned by the node stress is running on. The same stress command run on each node in the cluster would then write unique sstables only containing data which that node is responsible for. Following this no further network IO would be required to distribute data as it would all already be correctly in place. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7631) Allow Stress to write directly to SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076459#comment-14076459 ] Matt Kennedy commented on CASSANDRA-7631: - Having a mechanism like this is extremely important for testing large scale clusters. We don't necessarily want/need to test a large scale ingest each time, so the sooner we can go from spinning up 100 nodes, to running a mixed workload, the better. If one invocation of stress can tell 100 stressd processes to write local SSTables according to the user defined yaml, that should be massively more efficient than running a write job. Allow Stress to write directly to SSTables -- Key: CASSANDRA-7631 URL: https://issues.apache.org/jira/browse/CASSANDRA-7631 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Russell Alexander Spitzer Assignee: Russell Alexander Spitzer One common difficulty with benchmarking machines is the amount of time it takes to initially load data. For machines with a large amount of ram this becomes especially onerous because a very large amount of data needs to be placed on the machine before page-cache can be circumvented. To remedy this I suggest we add a top level flag to Cassandra-Stress which would cause the tool to write directly to sstables rather than actually performing CQL inserts. Internally this would use CQLSStable writer to write directly to sstables while skipping any keys which are not owned by the node stress is running on. The same stress command run on each node in the cluster would then write unique sstables only containing data which that node is responsible for. Following this no further network IO would be required to distribute data as it would all already be correctly in place. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7631) Allow Stress to write directly to SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076801#comment-14076801 ] Matt Kennedy commented on CASSANDRA-7631: - In many cases, we primarily care about mixed workloads, but those need a populated cluster to run on. So yes, writes are important, but mostly in the context of concurrent reads also happening. Allow Stress to write directly to SSTables -- Key: CASSANDRA-7631 URL: https://issues.apache.org/jira/browse/CASSANDRA-7631 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Russell Alexander Spitzer Assignee: Russell Alexander Spitzer One common difficulty with benchmarking machines is the amount of time it takes to initially load data. For machines with a large amount of ram this becomes especially onerous because a very large amount of data needs to be placed on the machine before page-cache can be circumvented. To remedy this I suggest we add a top level flag to Cassandra-Stress which would cause the tool to write directly to sstables rather than actually performing CQL inserts. Internally this would use CQLSStable writer to write directly to sstables while skipping any keys which are not owned by the node stress is running on. The same stress command run on each node in the cluster would then write unique sstables only containing data which that node is responsible for. Following this no further network IO would be required to distribute data as it would all already be correctly in place. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7631) Allow Stress to write directly to SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076854#comment-14076854 ] Matt Kennedy commented on CASSANDRA-7631: - Yes, ideally formatted using your new user-defined schema stuff. I don't mean to speak for Russ, but we fleshed out this idea jointly. Allow Stress to write directly to SSTables -- Key: CASSANDRA-7631 URL: https://issues.apache.org/jira/browse/CASSANDRA-7631 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Russell Alexander Spitzer Assignee: Russell Alexander Spitzer One common difficulty with benchmarking machines is the amount of time it takes to initially load data. For machines with a large amount of ram this becomes especially onerous because a very large amount of data needs to be placed on the machine before page-cache can be circumvented. To remedy this I suggest we add a top level flag to Cassandra-Stress which would cause the tool to write directly to sstables rather than actually performing CQL inserts. Internally this would use CQLSStable writer to write directly to sstables while skipping any keys which are not owned by the node stress is running on. The same stress command run on each node in the cluster would then write unique sstables only containing data which that node is responsible for. Following this no further network IO would be required to distribute data as it would all already be correctly in place. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7468) Add time-based execution to cassandra-stress
[ https://issues.apache.org/jira/browse/CASSANDRA-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Kennedy updated CASSANDRA-7468: Attachment: trunk-7468-rebase.patch Add time-based execution to cassandra-stress Key: CASSANDRA-7468 URL: https://issues.apache.org/jira/browse/CASSANDRA-7468 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Matt Kennedy Assignee: Matt Kennedy Priority: Minor Fix For: 2.1.1 Attachments: trunk-7468-rebase.patch, trunk-7468.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7468) Add time-based execution to cassandra-stress
[ https://issues.apache.org/jira/browse/CASSANDRA-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069268#comment-14069268 ] Matt Kennedy commented on CASSANDRA-7468: - Rebased to trunk. Changed '-d' parameter to '-duration'. Note, running this without the latest DataStax Java driver (2.1-beta2) results in some seemingly extraneous stack traces, but they don't seem to affect functionality. Add time-based execution to cassandra-stress Key: CASSANDRA-7468 URL: https://issues.apache.org/jira/browse/CASSANDRA-7468 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Matt Kennedy Assignee: Matt Kennedy Priority: Minor Fix For: 2.1.1 Attachments: trunk-7468-rebase.patch, trunk-7468.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7468) Add time-based execution to cassandra-stress
[ https://issues.apache.org/jira/browse/CASSANDRA-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069387#comment-14069387 ] Matt Kennedy commented on CASSANDRA-7468: - Thanks for the review, units are a welcome addition. I'm also relieved you got rid of the countInSeconds boolean to do it. I felt cheap doing it that way :-) Everything else looks good to me. Add time-based execution to cassandra-stress Key: CASSANDRA-7468 URL: https://issues.apache.org/jira/browse/CASSANDRA-7468 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Matt Kennedy Assignee: Matt Kennedy Priority: Minor Fix For: 2.1.1 Attachments: 7468v2.txt, trunk-7468-rebase.patch, trunk-7468.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7468) Add time-based execution to cassandra-stress
[ https://issues.apache.org/jira/browse/CASSANDRA-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056244#comment-14056244 ] Matt Kennedy commented on CASSANDRA-7468: - I've tried using the n, n method to get timed execution for the last week or so in two different stress testing environments in two cloud providers, unfortunately, the test execution times have run over by a significant amount. The independent timing thread method in the patch gives much more consistent results in terms of executing for the specified execution time. If you don't have any strenuous objections, I would like to see this incorporated. Add time-based execution to cassandra-stress Key: CASSANDRA-7468 URL: https://issues.apache.org/jira/browse/CASSANDRA-7468 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Matt Kennedy Priority: Minor Attachments: trunk-7468.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7468) Add time-based execution to cassandra-stress
[ https://issues.apache.org/jira/browse/CASSANDRA-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056654#comment-14056654 ] Matt Kennedy commented on CASSANDRA-7468: - Sure. Add time-based execution to cassandra-stress Key: CASSANDRA-7468 URL: https://issues.apache.org/jira/browse/CASSANDRA-7468 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Matt Kennedy Priority: Minor Fix For: 2.1.1 Attachments: trunk-7468.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7468) Add time-based execution to cassandra-stress
[ https://issues.apache.org/jira/browse/CASSANDRA-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047906#comment-14047906 ] Matt Kennedy commented on CASSANDRA-7468: - Hm, that might be easy to do, but it isn't exactly obvious that it's possible. What if the patch were re-worked to expose the -d param to explicitly set duration, but instead of keeping a distinct timer thread, it's internally set to use the same execution path it would have if n30 n30? Add time-based execution to cassandra-stress Key: CASSANDRA-7468 URL: https://issues.apache.org/jira/browse/CASSANDRA-7468 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Matt Kennedy Priority: Minor Attachments: trunk-7468.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7468) Add time-based execution to cassandra-stress
[ https://issues.apache.org/jira/browse/CASSANDRA-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047956#comment-14047956 ] Matt Kennedy commented on CASSANDRA-7468: - I picked -d for duration because I think people think of -t being associated with threads. s(amples)= could work, but we'd need to explain that there is a connection between samples and time (specifically a second), that isn't immediately obvious. The current help text for n: Run at least this many iterations before accepting uncertainty convergence doesn't make it clear that an iteration is a second. If you just go with a completely separate parameter for duration, then there's no need to change the language of the other help text just to make the time-based use case more obvious. But it's a simple feature, as long as I can _do_ time based runs, I'm not that fussed about how they get done. Add time-based execution to cassandra-stress Key: CASSANDRA-7468 URL: https://issues.apache.org/jira/browse/CASSANDRA-7468 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Matt Kennedy Priority: Minor Attachments: trunk-7468.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-7468) Add time-based execution to cassandra-stress
Matt Kennedy created CASSANDRA-7468: --- Summary: Add time-based execution to cassandra-stress Key: CASSANDRA-7468 URL: https://issues.apache.org/jira/browse/CASSANDRA-7468 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Matt Kennedy Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7468) Add time-based execution to cassandra-stress
[ https://issues.apache.org/jira/browse/CASSANDRA-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Kennedy updated CASSANDRA-7468: Attachment: trunk-7468.patch Add time-based execution to cassandra-stress Key: CASSANDRA-7468 URL: https://issues.apache.org/jira/browse/CASSANDRA-7468 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Matt Kennedy Priority: Minor Attachments: trunk-7468.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-7416) Allow cassandra-stress to set timestamp for writes
Matt Kennedy created CASSANDRA-7416: --- Summary: Allow cassandra-stress to set timestamp for writes Key: CASSANDRA-7416 URL: https://issues.apache.org/jira/browse/CASSANDRA-7416 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Matt Kennedy Priority: Trivial This is just a convenience for testing and bulk loading prior to a mixed workload. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7416) Allow cassandra-stress to set timestamp for writes
[ https://issues.apache.org/jira/browse/CASSANDRA-7416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Kennedy updated CASSANDRA-7416: Attachment: trunk-7416.txt Allow cassandra-stress to set timestamp for writes -- Key: CASSANDRA-7416 URL: https://issues.apache.org/jira/browse/CASSANDRA-7416 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Matt Kennedy Priority: Trivial Attachments: trunk-7416.txt This is just a convenience for testing and bulk loading prior to a mixed workload. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7417) Allow network configuration on interfaces instead of addresses
[ https://issues.apache.org/jira/browse/CASSANDRA-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Kennedy updated CASSANDRA-7417: Attachment: trunk-7417.txt Allow network configuration on interfaces instead of addresses -- Key: CASSANDRA-7417 URL: https://issues.apache.org/jira/browse/CASSANDRA-7417 Project: Cassandra Issue Type: Improvement Components: Config, Core Reporter: Matt Kennedy Priority: Minor Attachments: trunk-7417.txt This patch adds two config elements to cassandra.yaml: listen_interface and rpc_interface. These can be used instead of their *_address counterparts to configure bind the addresses C* listens on. This capability can drastically simplify some deployment scenarios, especially in clouds which sometimes have quirky automation capabilities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7417) Allow network configuration on interfaces instead of addresses
[ https://issues.apache.org/jira/browse/CASSANDRA-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Kennedy updated CASSANDRA-7417: Description: This patch adds two config elements to cassandra.yaml: listen_interface and rpc_interface. These can be used instead of their _address counterparts to configure bind the addresses Cassandra listens on. This capability can drastically simplify some deployment scenarios, especially in clouds which sometimes have quirky automation capabilities. was: This patch adds two config elements to cassandra.yaml: listen_interface and rpc_interface. These can be used instead of their *_address counterparts to configure bind the addresses C* listens on. This capability can drastically simplify some deployment scenarios, especially in clouds which sometimes have quirky automation capabilities. Allow network configuration on interfaces instead of addresses -- Key: CASSANDRA-7417 URL: https://issues.apache.org/jira/browse/CASSANDRA-7417 Project: Cassandra Issue Type: Improvement Components: Config, Core Reporter: Matt Kennedy Priority: Minor Attachments: trunk-7417.txt This patch adds two config elements to cassandra.yaml: listen_interface and rpc_interface. These can be used instead of their _address counterparts to configure bind the addresses Cassandra listens on. This capability can drastically simplify some deployment scenarios, especially in clouds which sometimes have quirky automation capabilities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-7417) Allow network configuration on interfaces instead of addresses
Matt Kennedy created CASSANDRA-7417: --- Summary: Allow network configuration on interfaces instead of addresses Key: CASSANDRA-7417 URL: https://issues.apache.org/jira/browse/CASSANDRA-7417 Project: Cassandra Issue Type: Improvement Components: Config, Core Reporter: Matt Kennedy Priority: Minor Attachments: trunk-7417.txt This patch adds two config elements to cassandra.yaml: listen_interface and rpc_interface. These can be used instead of their *_address counterparts to configure bind the addresses C* listens on. This capability can drastically simplify some deployment scenarios, especially in clouds which sometimes have quirky automation capabilities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7417) Allow network configuration on interfaces instead of addresses
[ https://issues.apache.org/jira/browse/CASSANDRA-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Kennedy updated CASSANDRA-7417: Description: This patch adds two config elements to cassandra.yaml: listen_interface and rpc_interface. These can be used instead of their _address counterparts to bind the addresses Cassandra listens on. This capability can drastically simplify some deployment scenarios, especially in clouds which sometimes have quirky automation capabilities. was: This patch adds two config elements to cassandra.yaml: listen_interface and rpc_interface. These can be used instead of their _address counterparts to configure bind the addresses Cassandra listens on. This capability can drastically simplify some deployment scenarios, especially in clouds which sometimes have quirky automation capabilities. Allow network configuration on interfaces instead of addresses -- Key: CASSANDRA-7417 URL: https://issues.apache.org/jira/browse/CASSANDRA-7417 Project: Cassandra Issue Type: Improvement Components: Config, Core Reporter: Matt Kennedy Priority: Minor Attachments: trunk-7417.txt This patch adds two config elements to cassandra.yaml: listen_interface and rpc_interface. These can be used instead of their _address counterparts to bind the addresses Cassandra listens on. This capability can drastically simplify some deployment scenarios, especially in clouds which sometimes have quirky automation capabilities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7306) Support edge dcs with more flexible gossip
[ https://issues.apache.org/jira/browse/CASSANDRA-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010231#comment-14010231 ] Matt Kennedy commented on CASSANDRA-7306: - It should be noted that we can do some of this today by defining keyspaces that only have # of replicas 0 in some data centers. But, gossip still needs to function over all the nodes. Hub Spoke functionality is useful in situations where the spokes are geographically dispersed, potentially in areas with less than ideal network connections. Local clients should be able to read/write locally relevant data on small scale clusters and make progress even when completely disconnected from the mothership without having to worry about replicating back a lot of data from unrelated DCs, or having to be networked to DCs halfway across the planet just to gossip between nodes that are otherwise unrelated. Support edge dcs with more flexible gossip Key: CASSANDRA-7306 URL: https://issues.apache.org/jira/browse/CASSANDRA-7306 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Tupshin Harper Labels: ponies As Cassandra clusters get bigger and bigger, and their topology becomes more complex, there is more and more need for a notion of hub and spoke datacenters. One of the big obstacles to supporting hundreds (or thousands) of remote dcs, is the assumption that all dcs need to talk to each other (and be connected all the time). This ticket is a vague placeholder with the goals of achieving: 1) better behavioral support for occasionally disconnected datacenters 2) explicit support for custom dc to dc routing. A simple approach would be an optional per-dc annotation of which other DCs that DC could gossip with. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-2853) cassandra-cli has backwards index status message
cassandra-cli has backwards index status message Key: CASSANDRA-2853 URL: https://issues.apache.org/jira/browse/CASSANDRA-2853 Project: Cassandra Issue Type: Bug Components: Core Reporter: Matt Kennedy Priority: Trivial When a secondary index is building, the total bytes and processed bytes are swapped in the message. Example: Currently building index cf1, completed 12052040551 of 18047343 bytes. The problem is a call to CompactionInfo constructor with swapped parameters. Patch to follow. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2853) cassandra-cli has backwards index status message
[ https://issues.apache.org/jira/browse/CASSANDRA-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Kennedy updated CASSANDRA-2853: Affects Version/s: 0.8.1 cassandra-cli has backwards index status message Key: CASSANDRA-2853 URL: https://issues.apache.org/jira/browse/CASSANDRA-2853 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.8.1 Reporter: Matt Kennedy Priority: Trivial Attachments: fix_idx_msg.patch When a secondary index is building, the total bytes and processed bytes are swapped in the message. Example: Currently building index cf1, completed 12052040551 of 18047343 bytes. The problem is a call to CompactionInfo constructor with swapped parameters. Patch to follow. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2853) cassandra-cli has backwards index status message
[ https://issues.apache.org/jira/browse/CASSANDRA-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Kennedy updated CASSANDRA-2853: Attachment: fix_idx_msg.patch cassandra-cli has backwards index status message Key: CASSANDRA-2853 URL: https://issues.apache.org/jira/browse/CASSANDRA-2853 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.8.1 Reporter: Matt Kennedy Priority: Trivial Attachments: fix_idx_msg.patch When a secondary index is building, the total bytes and processed bytes are swapped in the message. Example: Currently building index cf1, completed 12052040551 of 18047343 bytes. The problem is a call to CompactionInfo constructor with swapped parameters. Patch to follow. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2276) Pig memory issues with default LIMIT and large rows.
[ https://issues.apache.org/jira/browse/CASSANDRA-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13003657#comment-13003657 ] Matt Kennedy commented on CASSANDRA-2276: - D'oh! I wrote it against a checkout of the 0.7.3 tag instead of trunk. I'll port the changes to trunk tonight. Sorry for the confusion. Pig memory issues with default LIMIT and large rows. Key: CASSANDRA-2276 URL: https://issues.apache.org/jira/browse/CASSANDRA-2276 Project: Cassandra Issue Type: Improvement Components: Hadoop Affects Versions: 0.7.0 Reporter: Matt Kennedy Priority: Trivial Labels: hadoop, pig Fix For: 0.7.4 Attachments: cassandrastorage.diff, cassandrastorage_2.diff Original Estimate: 1h Remaining Estimate: 1h Rows with a lot of columns, especially super-colums with a lot of values can cause OutOfMemory errors in Cassandra when queried with Pig. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-2276) Pig memory issues with default LIMIT and large rows.
[ https://issues.apache.org/jira/browse/CASSANDRA-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Kennedy updated CASSANDRA-2276: Attachment: cassandrastorage3.diff OK, third time's the charm, coded this one against trunk and just successfully applied it to a fresh check-out. Pig memory issues with default LIMIT and large rows. Key: CASSANDRA-2276 URL: https://issues.apache.org/jira/browse/CASSANDRA-2276 Project: Cassandra Issue Type: Improvement Components: Hadoop Affects Versions: 0.7.0 Reporter: Matt Kennedy Priority: Trivial Labels: hadoop, pig Fix For: 0.7.4 Attachments: cassandrastorage.diff, cassandrastorage3.diff, cassandrastorage_2.diff Original Estimate: 1h Remaining Estimate: 1h Rows with a lot of columns, especially super-colums with a lot of values can cause OutOfMemory errors in Cassandra when queried with Pig. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2276) Pig memory issues with default LIMIT and large rows.
[ https://issues.apache.org/jira/browse/CASSANDRA-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13002972#comment-13002972 ] Matt Kennedy commented on CASSANDRA-2276: - Only for the purposes of counting the super columns, no access to the subcolumns. Pig memory issues with default LIMIT and large rows. Key: CASSANDRA-2276 URL: https://issues.apache.org/jira/browse/CASSANDRA-2276 Project: Cassandra Issue Type: Improvement Components: Hadoop Affects Versions: 0.7.0 Reporter: Matt Kennedy Priority: Trivial Labels: hadoop, pig Fix For: 0.7.4 Attachments: cassandrastorage.diff, cassandrastorage_2.diff Original Estimate: 1h Remaining Estimate: 1h Rows with a lot of columns, especially super-colums with a lot of values can cause OutOfMemory errors in Cassandra when queried with Pig. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (CASSANDRA-2276) Pig memory issues with default LIMIT and large rows.
Pig memory issues with default LIMIT and large rows. Key: CASSANDRA-2276 URL: https://issues.apache.org/jira/browse/CASSANDRA-2276 Project: Cassandra Issue Type: Improvement Components: Contrib Affects Versions: 0.7.3 Reporter: Matt Kennedy Priority: Trivial Rows with a lot of columns, especially super-colums with a lot of values can cause OutOfMemory errors in Cassandra when queried with Pig. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-2276) Pig memory issues with default LIMIT and large rows.
[ https://issues.apache.org/jira/browse/CASSANDRA-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Kennedy updated CASSANDRA-2276: Attachment: cassandrastorage.diff Pig memory issues with default LIMIT and large rows. Key: CASSANDRA-2276 URL: https://issues.apache.org/jira/browse/CASSANDRA-2276 Project: Cassandra Issue Type: Improvement Components: Contrib Affects Versions: 0.7.3 Reporter: Matt Kennedy Priority: Trivial Labels: hadoop, pig Attachments: cassandrastorage.diff Original Estimate: 1h Remaining Estimate: 1h Rows with a lot of columns, especially super-colums with a lot of values can cause OutOfMemory errors in Cassandra when queried with Pig. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2245) Enable map reduce to use indexes for ColumnFamilyInputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13002930#comment-13002930 ] Matt Kennedy commented on CASSANDRA-2245: - I've taken a crack at coding this up, but I'm not thrilled with the results. I agree with Brandon that CASSANDRA-1600 is the best way to deal with this issue. The get_indexed_slices method doesn't offer the parameter for a key_range that makes this useful for a map reduce job. I'm reviewing that discussion at the moment to see if there is a way to get a patch for something like this functionality out prior to 0.8 without breaking the thrift API. Enable map reduce to use indexes for ColumnFamilyInputFormat Key: CASSANDRA-2245 URL: https://issues.apache.org/jira/browse/CASSANDRA-2245 Project: Cassandra Issue Type: Improvement Components: Hadoop Affects Versions: 0.7.2 Environment: Cassandra 0.7 or later and Hadoop 0.20.1 or later Reporter: Matt Kennedy Priority: Minor Labels: hadoop Fix For: 0.8 Original Estimate: 72h Remaining Estimate: 72h Enable the ability to run a MapReduce job that takes a value in an indexed column as a parameter, and use that to select the data that the MapReduce job operates on. Right now, it looks like this isn't possible because org.apache.cassandra.hadoop.ColumnFamilyRecordReader will only fetch data with get_range_slices, not get_indexed_slices. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2276) Pig memory issues with default LIMIT and large rows.
[ https://issues.apache.org/jira/browse/CASSANDRA-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13002936#comment-13002936 ] Matt Kennedy commented on CASSANDRA-2276: - Yeah, fair point. It isn't really useful, I was just letting eclipse write code for me. Pig memory issues with default LIMIT and large rows. Key: CASSANDRA-2276 URL: https://issues.apache.org/jira/browse/CASSANDRA-2276 Project: Cassandra Issue Type: Improvement Components: Hadoop Affects Versions: 0.7.0 Reporter: Matt Kennedy Priority: Trivial Labels: hadoop, pig Fix For: 0.7.4 Attachments: cassandrastorage.diff Original Estimate: 1h Remaining Estimate: 1h Rows with a lot of columns, especially super-colums with a lot of values can cause OutOfMemory errors in Cassandra when queried with Pig. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-2276) Pig memory issues with default LIMIT and large rows.
[ https://issues.apache.org/jira/browse/CASSANDRA-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Kennedy updated CASSANDRA-2276: Attachment: cassandrastorage_2.diff new patch reflecting Jonathan Ellis' comment. Pig memory issues with default LIMIT and large rows. Key: CASSANDRA-2276 URL: https://issues.apache.org/jira/browse/CASSANDRA-2276 Project: Cassandra Issue Type: Improvement Components: Hadoop Affects Versions: 0.7.0 Reporter: Matt Kennedy Priority: Trivial Labels: hadoop, pig Fix For: 0.7.4 Attachments: cassandrastorage.diff, cassandrastorage_2.diff Original Estimate: 1h Remaining Estimate: 1h Rows with a lot of columns, especially super-colums with a lot of values can cause OutOfMemory errors in Cassandra when queried with Pig. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-2276) Pig memory issues with default LIMIT and large rows.
[ https://issues.apache.org/jira/browse/CASSANDRA-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Kennedy updated CASSANDRA-2276: Attachment: (was: cassandrastorage_2.diff) Pig memory issues with default LIMIT and large rows. Key: CASSANDRA-2276 URL: https://issues.apache.org/jira/browse/CASSANDRA-2276 Project: Cassandra Issue Type: Improvement Components: Hadoop Affects Versions: 0.7.0 Reporter: Matt Kennedy Priority: Trivial Labels: hadoop, pig Fix For: 0.7.4 Attachments: cassandrastorage.diff Original Estimate: 1h Remaining Estimate: 1h Rows with a lot of columns, especially super-colums with a lot of values can cause OutOfMemory errors in Cassandra when queried with Pig. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-2276) Pig memory issues with default LIMIT and large rows.
[ https://issues.apache.org/jira/browse/CASSANDRA-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Kennedy updated CASSANDRA-2276: Attachment: cassandrastorage_2.diff Corrected patch for final limit. Pig memory issues with default LIMIT and large rows. Key: CASSANDRA-2276 URL: https://issues.apache.org/jira/browse/CASSANDRA-2276 Project: Cassandra Issue Type: Improvement Components: Hadoop Affects Versions: 0.7.0 Reporter: Matt Kennedy Priority: Trivial Labels: hadoop, pig Fix For: 0.7.4 Attachments: cassandrastorage.diff, cassandrastorage_2.diff Original Estimate: 1h Remaining Estimate: 1h Rows with a lot of columns, especially super-colums with a lot of values can cause OutOfMemory errors in Cassandra when queried with Pig. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (CASSANDRA-2245) Enable map reduce to use indexes for ColumnFamilyInputFormat
Enable map reduce to use indexes for ColumnFamilyInputFormat Key: CASSANDRA-2245 URL: https://issues.apache.org/jira/browse/CASSANDRA-2245 Project: Cassandra Issue Type: Improvement Components: Hadoop Affects Versions: 0.7.2 Environment: Cassandra 0.7 or later and Hadoop 0.20.1 or later Reporter: Matt Kennedy Priority: Minor Fix For: 0.8 Enable the ability to run a MapReduce job that takes a value in an indexed column as a parameter, and use that to select the data that the MapReduce job operates on. Right now, it looks like this isn't possible because org.apache.cassandra.hadoop.ColumnFamilyRecordReader will only fetch data with get_range_slices, not get_indexed_slices. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (CASSANDRA-2246) Enable Pig to use indexed data as described in CASSANDRA-2245
Enable Pig to use indexed data as described in CASSANDRA-2245 - Key: CASSANDRA-2246 URL: https://issues.apache.org/jira/browse/CASSANDRA-2246 Project: Cassandra Issue Type: Improvement Components: Contrib Affects Versions: 0.7.2 Reporter: Matt Kennedy Priority: Minor Fix For: 0.8 in contrib/pig, add query parameters to CassandraStorage keyspace/column family string to specify column search predicates. For example: rows = LOAD 'cassandra://mykeyspace/mycolumnfamily?country=UK' using CassandraStorage(); This depends on CASSANDRA-2245 -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira