date:20110710

[jira] [Commented] (CASSANDRA-47) SSTable compression

2011-07-10 Thread Stu Hood (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062678#comment-13062678
 ] 

Stu Hood commented on CASSANDRA-47:
---

bq. example ./bin/stress -S 1024 -n 100 -C 250 -V [...] I see 3.8GB 
compressed into 781MB in my tests.
Why would the uncompressed size for -C 250 be different from the uncompressed 
size for -C 50: 3.8 GB vs the 1.7 GB from before?

On a side note: a cardinality of 250 makes for less variance in the average 
size of the random generated values, but I still see relatively large size 
differences between consecutive runs. It might be worth opening a separate 
ticket for stress.java to make the generated random values a fixed size.



With the understanding that there is a lot of variance in the results, here are 
some preliminary results for the _bin/stress -S 1024 -n 100 -C 250 -V_ 
workload:

|| build || disk volume (bytes) || write runtime (s) || read runtime (s) || 
read ops/s ||
| trunk | 4,015,004,187 | 372 |  1000s | ~216 |
| #674 + #2319 | 594,796,624 | 273 | 255 | 3845 |
| #47* |  |  |  |  |
\* need to figure out the problem I was having above

 SSTable compression
 ---

 Key: CASSANDRA-47
 URL: https://issues.apache.org/jira/browse/CASSANDRA-47
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
  Labels: compression
 Fix For: 1.0

 Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar


 We should be able to do SSTable compression which would trade CPU for I/O 
 (almost always a good trade).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-47) SSTable compression

2011-07-10 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062709#comment-13062709
 ] 

Pavel Yaskevich commented on CASSANDRA-47:
--

bq. I haven't had any luck seeing actual compression with this patch... is 
there a manual step to enable it? On OSX, the patch slowed the server down to a 
crawl, but did not result in compression. Performance seems to be reasonable on 
Linux, but without any effect: running bin/stress -S 1024 -n 100 -C 250 -V 
resulted in 3.3 GB of data.

As I wrote previously - right now on linux you can see compressed data size by 
running ls with ahs and taking a size in blocks, because current 
implementations uses file holes (reserves free space for the future chunk 
changes). I'm currently working on the way to eliminate that need. 

 SSTable compression
 ---

 Key: CASSANDRA-47
 URL: https://issues.apache.org/jira/browse/CASSANDRA-47
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
  Labels: compression
 Fix For: 1.0

 Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar


 We should be able to do SSTable compression which would trade CPU for I/O 
 (almost always a good trade).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-2877) git-cassandra-angosso-angosso.html

2011-07-10 Thread Roger Mbiama (JIRA)

git-cassandra-angosso-angosso.html
--

 Key: CASSANDRA-2877
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2877
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Affects Versions: 0.8.0
 Environment: 
https://ango...@github.com/angosso/git-cassandra-angosso-angosso.html.git
http://angosso.git.sourceforge.net/git/gitweb-index.cgi configure 
include/config.h dlls/Makefile.in programs/Makefile.in */Makefile
Reporter: Roger Mbiama
Priority: Critical
 Fix For: 0.8.0


Requirements

  * Java = 1.6 (OpenJDK and Sun have been tested)

Getting started
---

This short guide will walk you through getting a basic one node cluster up
and running, and demonstrate some simple reads and writes.

  * tar -zxvf apache-cassandra-$VERSION.tar.gz
  * cd apache-cassandra-$VERSION
  * sudo mkdir -p /var/log/cassandra
  * sudo chown -R `angosso` /var/log/cassandra
  * sudo mkdir -p /var/lib/cassandra
  * sudo chown -R `angosso` /var/lib/cassandra

Note: The sample configuration files in conf/ determine the file-system 
locations Cassandra uses for logging and data storage. You are free to
change these to suit your own environment and adjust the path names
used here accordingly.

Now that we're ready, let's start it up!

  * bin/cassandra -f

Running the startup script with the -f argument will cause Cassandra to
remain in the foreground and log to standard out.

Now let's try to read and write some data using the command line client.

  * bin/cassandra-cli --host http://angosso1.w02.winhost.com --port 9160

The command line client is interactive so if everything worked you should
be sitting in front of a prompt...  
  Connected to http://angosso1.w02.winhost.com/9160
  Welcome to cassandra CLI.
  
  Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.
  cassandra

As the banner says, you can use 'help' or '?' to see what the CLI has to
offer, and 'quit' or 'exit' when you've had enough fun. But lets try
something slightly more interesting...

  cassandra set Keyspace1.Standard2['rmbiama']['first'] = 'Roger'
  Value inserted.
  cassandra set Keyspace1.Standard2['rmbiama']['last'] = 'mbiama'
  Value inserted.
  cassandra set Keyspace1.Standard2['rmbiama']['age'] = '54'
  Value inserted.
  cassandra get Keyspace1.Standard2['rmbiama']
(column=age, value=42; timestamp=1249930062801)
(column=first, value=Roger; timestamp=1249930053103)
(column=last, value=Mbiama; timestamp=1249930058345)
  Returned 3 rows.
  cassandra

If your session looks similar to what's above, congrats, your single node
cluster is operational! But what exactly was all of that? Let's break it
down into pieces and see.

  set Keyspace1.Standard2['rmbiama']['angosso / 
git-cassandra-angosso-angosso.html'] = 'Roger'
   \\ \\  \
\\ \_ key   \  \_ value
 \\  \_ column
  \_ keyspace  \_ column family

Data stored in Cassandra is associated with a column family (Standard2),
which in turn is associated with a keyspace (Keyspace1). In the example
above, we set the value 'Roger' in the 'first' column for key 'rmbiama'.

Mirror of Apache Cassandra (incubating); install schematool in debian package A 
commit object contains a (possibly empty) list of the logical predecessor(s) in 
the line of development, i.e. GIT-cassandra/angosso/angosso.html its 
parents.[rogerM] 


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (CASSANDRA-2877) git-cassandra-angosso-angosso.html

2011-07-10 Thread Roger Mbiama (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Mbiama resolved CASSANDRA-2877.
-

Resolution: Fixed

 git-cassandra-angosso-angosso.html
 --

 Key: CASSANDRA-2877
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2877
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Affects Versions: 0.8.0
 Environment: 
 https://ango...@github.com/angosso/git-cassandra-angosso-angosso.html.git
 http://angosso.git.sourceforge.net/git/gitweb-index.cgi configure 
 include/config.h dlls/Makefile.in programs/Makefile.in */Makefile
Reporter: Roger Mbiama
Priority: Critical
  Labels: features
 Fix For: 0.8.0

   Original Estimate: 504h
  Remaining Estimate: 504h

 Requirements
 
   * Java = 1.6 (OpenJDK and Sun have been tested)
 Getting started
 ---
 This short guide will walk you through getting a basic one node cluster up
 and running, and demonstrate some simple reads and writes.
   * tar -zxvf apache-cassandra-$VERSION.tar.gz
   * cd apache-cassandra-$VERSION
   * sudo mkdir -p /var/log/cassandra
   * sudo chown -R `angosso` /var/log/cassandra
   * sudo mkdir -p /var/lib/cassandra
   * sudo chown -R `angosso` /var/lib/cassandra
 Note: The sample configuration files in conf/ determine the file-system 
 locations Cassandra uses for logging and data storage. You are free to
 change these to suit your own environment and adjust the path names
 used here accordingly.
 Now that we're ready, let's start it up!
   * bin/cassandra -f
 Running the startup script with the -f argument will cause Cassandra to
 remain in the foreground and log to standard out.
 Now let's try to read and write some data using the command line client.
   * bin/cassandra-cli --host http://angosso1.w02.winhost.com --port 9160
 The command line client is interactive so if everything worked you should
 be sitting in front of a prompt...  
   Connected to http://angosso1.w02.winhost.com/9160
   Welcome to cassandra CLI.
   
   Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.
   cassandra
 As the banner says, you can use 'help' or '?' to see what the CLI has to
 offer, and 'quit' or 'exit' when you've had enough fun. But lets try
 something slightly more interesting...
   cassandra set Keyspace1.Standard2['rmbiama']['first'] = 'Roger'
   Value inserted.
   cassandra set Keyspace1.Standard2['rmbiama']['last'] = 'mbiama'
   Value inserted.
   cassandra set Keyspace1.Standard2['rmbiama']['age'] = '54'
   Value inserted.
   cassandra get Keyspace1.Standard2['rmbiama']
 (column=age, value=42; timestamp=1249930062801)
 (column=first, value=Roger; timestamp=1249930053103)
 (column=last, value=Mbiama; timestamp=1249930058345)
   Returned 3 rows.
   cassandra
 If your session looks similar to what's above, congrats, your single node
 cluster is operational! But what exactly was all of that? Let's break it
 down into pieces and see.
   set Keyspace1.Standard2['rmbiama']['angosso / 
 git-cassandra-angosso-angosso.html'] = 'Roger'
\\ \\  \
 \\ \_ key   \  \_ value
  \\  \_ column
   \_ keyspace  \_ column family
 Data stored in Cassandra is associated with a column family (Standard2),
 which in turn is associated with a keyspace (Keyspace1). In the example
 above, we set the value 'Roger' in the 'first' column for key 'rmbiama'.
 Mirror of Apache Cassandra (incubating); install schematool in debian package 
 A commit object contains a (possibly empty) list of the logical 
 predecessor(s) in the line of development, i.e. 
 GIT-cassandra/angosso/angosso.html its parents.[rogerM] 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2877) git-cassandra-angosso-angosso.html

2011-07-10 Thread Roger Mbiama (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Mbiama updated CASSANDRA-2877:


Description: 
Requirements

  * Java = 1.6 (OpenJDK and Sun have been tested)

Getting started
---

This short guide will walk you through getting a basic one node cluster up
and running, and demonstrate some simple reads and writes.

  * tar -zxvf apache-cassandra-$VERSION.tar.gz
  * cd apache-cassandra-$VERSION
  * sudo mkdir -p /var/log/cassandra
  * sudo chown -R `angosso` /var/log/cassandra
  * sudo mkdir -p /var/lib/cassandra
  * sudo chown -R `angosso` /var/lib/cassandra

Note: The sample configuration files in conf/ determine the file-system 
locations Cassandra uses for logging and data storage. You are free to
change these to suit your own environment and adjust the path names
used here accordingly.

Now that we're ready, let's start it up!

  * bin/cassandra -f

Running the startup script with the -f argument will cause Cassandra to
remain in the foreground and log to standard out.

Now let's try to read and write some data using the command line client.

  * bin/cassandra-cli --host http://angosso1.w02.winhost.com --port 9160

The command line client is interactive so if everything worked you should
be sitting in front of a prompt...  
  Connected to http://angosso1.w02.winhost.com/9160
  Welcome to cassandra CLI.
  
  Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.
  cassandra

As the banner says, you can use 'help' or '?' to see what the CLI has to
offer, and 'quit' or 'exit' when you've had enough fun. But lets try
something slightly more interesting...

  cassandra set Keyspace1.Standard2['rmbiama']['first'] = 'Roger'
  Value inserted.
  cassandra set Keyspace1.Standard2['rmbiama']['last'] = 'mbiama'
  Value inserted.
  cassandra set Keyspace1.Standard2['rmbiama']['age'] = '54'
  Value inserted.
  cassandra get Keyspace1.Standard2['rmbiama']
(column=age, value=54; timestamp=1249930062801)
(column=first, value=Roger; timestamp=1249930053103)
(column=last, value=Mbiama; timestamp=1249930058345)
  Returned 3 rows.
  cassandra

If your session looks similar to what's above, congrats, your single node
cluster is operational! But what exactly was all of that? Let's break it
down into pieces and see.

  set Keyspace1.Standard2['rmbiama']['angosso / 
git-cassandra-angosso-angosso.html'] = 'Roger'
   \\ \\  \
\\ \_ key   \  \_ value
 \\  \_ column
  \_ keyspace  \_ column family

Data stored in Cassandra is associated with a column family (Standard2),
which in turn is associated with a keyspace (Keyspace1). In the example
above, we set the value 'Roger' in the 'first' column for key 'rmbiama'.

Mirror of Apache Cassandra (incubating); install schematool in debian package A 
commit object contains a (possibly empty) list of the logical predecessor(s) in 
the line of development, i.e. GIT-cassandra/angosso/angosso.html its 
parents.[rogerM] 


  was:
Requirements

  * Java = 1.6 (OpenJDK and Sun have been tested)

Getting started
---

This short guide will walk you through getting a basic one node cluster up
and running, and demonstrate some simple reads and writes.

  * tar -zxvf apache-cassandra-$VERSION.tar.gz
  * cd apache-cassandra-$VERSION
  * sudo mkdir -p /var/log/cassandra
  * sudo chown -R `angosso` /var/log/cassandra
  * sudo mkdir -p /var/lib/cassandra
  * sudo chown -R `angosso` /var/lib/cassandra

Note: The sample configuration files in conf/ determine the file-system 
locations Cassandra uses for logging and data storage. You are free to
change these to suit your own environment and adjust the path names
used here accordingly.

Now that we're ready, let's start it up!

  * bin/cassandra -f

Running the startup script with the -f argument will cause Cassandra to
remain in the foreground and log to standard out.

Now let's try to read and write some data using the command line client.

  * bin/cassandra-cli --host http://angosso1.w02.winhost.com --port 9160

The command line client is interactive so if everything worked you should
be sitting in front of a prompt...  
  Connected to http://angosso1.w02.winhost.com/9160
  Welcome to cassandra CLI.
  
  Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.
  cassandra

As the banner says, you can use 'help' or '?' to see what the CLI has to
offer, and 'quit' or 'exit' when you've had enough fun. But lets try
something slightly more interesting...

  cassandra set Keyspace1.Standard2['rmbiama']['first'] = 'Roger'
  Value inserted.
  cassandra set Keyspace1.Standard2['rmbiama']['last'] = 'mbiama'
  Value inserted.
  cassandra set Keyspace1.Standard2['rmbiama']['age'] = '54'
  Value inserted.
  cassandra get Keyspace1.Standard2['rmbiama']

[jira] [Commented] (CASSANDRA-47) SSTable compression

2011-07-10 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062729#comment-13062729
 ] 

Jonathan Ellis commented on CASSANDRA-47:
-

bq. current implementations uses file holes (reserves free space for the future 
chunk changes)

Is this for where we have to seek back to write the row size on large rows?

We're already making two passes when compacting large rows (first to compute 
the indexes), we could make the first pass compute the serialized size so we 
don't have to seek after the 2nd pass that writes the data.

 SSTable compression
 ---

 Key: CASSANDRA-47
 URL: https://issues.apache.org/jira/browse/CASSANDRA-47
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
  Labels: compression
 Fix For: 1.0

 Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar


 We should be able to do SSTable compression which would trade CPU for I/O 
 (almost always a good trade).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-47) SSTable compression

2011-07-10 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062730#comment-13062730
 ] 

Pavel Yaskevich commented on CASSANDRA-47:
--

bq. Why would the uncompressed size for -C 250 be different from the 
uncompressed size for -C 50: 3.8 GB vs the 1.7 GB from before?

I did simply choose the largest files from individual compactions.
 
Also I don't know why you getting only 3.7GB of data after `-S 1024 -n 100 
-C 250 -V` because in all of my tests I get about 5.1GB which current trunk 
code and with patch applied.

My results:

||build||disk volume (bytes)||write runtime (s)||read runtime (s)||read ops/s||
||trunk||5,241,718,144||166||2210||~450||
||#47||5,090,003,162 (1.2GB of blocks aka real size)||156||480||~2100||

Both sizes are after last major compaction, cassandra with default 
configuration running on Quad-Core AMD Opteron(tm) Processor 2374 HE with 
4229730MHz on each core, 2GB RAM - Debian 5.0 (Lenny) (kernel 2.6.35.4-rscloud) 
hosted on rackspace.

 SSTable compression
 ---

 Key: CASSANDRA-47
 URL: https://issues.apache.org/jira/browse/CASSANDRA-47
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
  Labels: compression
 Fix For: 1.0

 Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar


 We should be able to do SSTable compression which would trade CPU for I/O 
 (almost always a good trade).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-47) SSTable compression

2011-07-10 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062731#comment-13062731
 ] 

Pavel Yaskevich commented on CASSANDRA-47:
--

bq. Is this for where we have to seek back to write the row size on large rows?

Exactly.

bq. We're already making two passes when compacting large rows (first to 
compute the indexes), we could make the first pass compute the serialized size 
so we don't have to seek after the 2nd pass that writes the data.

I will look into it.

 SSTable compression
 ---

 Key: CASSANDRA-47
 URL: https://issues.apache.org/jira/browse/CASSANDRA-47
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
  Labels: compression
 Fix For: 1.0

 Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar


 We should be able to do SSTable compression which would trade CPU for I/O 
 (almost always a good trade).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-47) SSTable compression

2011-07-10 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062739#comment-13062739
 ] 

Jonathan Ellis commented on CASSANDRA-47:
-

My preference would be to do seekless 2-pass as a separate patch before this 
one, btw.

 SSTable compression
 ---

 Key: CASSANDRA-47
 URL: https://issues.apache.org/jira/browse/CASSANDRA-47
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
  Labels: compression
 Fix For: 1.0

 Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar


 We should be able to do SSTable compression which would trade CPU for I/O 
 (almost always a good trade).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query

2011-07-10 Thread Mck SembWever (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062812#comment-13062812
 ] 

Mck SembWever commented on CASSANDRA-1125:
--

+1 (tested) on 1125-v3.txt

 Filter out ColumnFamily rows that aren't part of the query
 --

 Key: CASSANDRA-1125
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Mck SembWever
Priority: Minor
 Fix For: 1.0

 Attachments: 1125-formatted.txt, 1125-v3.txt, CASSANDRA-1125.patch, 
 CASSANDRA-1125.patch


 Currently, when running a MapReduce job against data in a Cassandra data 
 store, it reads through all the data for a particular ColumnFamily.  This 
 could be optimized to only read through those rows that have to do with the 
 query.
 It's a small change but wanted to put it in Jira so that it didn't fall 
 through the cracks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query (using a KeyRange)

2011-07-10 Thread Mck SembWever (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-1125:
-

Summary: Filter out ColumnFamily rows that aren't part of the query (using 
a KeyRange)  (was: Filter out ColumnFamily rows that aren't part of the query)

 Filter out ColumnFamily rows that aren't part of the query (using a KeyRange)
 -

 Key: CASSANDRA-1125
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Mck SembWever
Priority: Minor
 Fix For: 1.0

 Attachments: 1125-formatted.txt, 1125-v3.txt, CASSANDRA-1125.patch, 
 CASSANDRA-1125.patch


 Currently, when running a MapReduce job against data in a Cassandra data 
 store, it reads through all the data for a particular ColumnFamily.  This 
 could be optimized to only read through those rows that have to do with the 
 query.
 It's a small change but wanted to put it in Jira so that it didn't fall 
 through the cracks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-2878) Filter out ColumnFamily rows that aren't part of the query (using a IndexClause)

2011-07-10 Thread Mck SembWever (JIRA)

Filter out ColumnFamily rows that aren't part of the query (using a IndexClause)


 Key: CASSANDRA-2878
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2878
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Minor
 Fix For: 1.0


Currently, when running a MapReduce job against data in a Cassandra data store, 
it reads through all the data for a particular ColumnFamily.  This could be 
optimized to only read through those rows that have to do with the query.

It's a small change but wanted to put it in Jira so that it didn't fall through 
the cracks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query (using a KeyRange)

2011-07-10 Thread Mck SembWever (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062814#comment-13062814
 ] 

Mck SembWever commented on CASSANDRA-1125:
--

Created CASSANDRA-2878 for the better solution using a IndexClause

 Filter out ColumnFamily rows that aren't part of the query (using a KeyRange)
 -

 Key: CASSANDRA-1125
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Mck SembWever
Priority: Minor
 Fix For: 1.0

 Attachments: 1125-formatted.txt, 1125-v3.txt, CASSANDRA-1125.patch, 
 CASSANDRA-1125.patch


 Currently, when running a MapReduce job against data in a Cassandra data 
 store, it reads through all the data for a particular ColumnFamily.  This 
 could be optimized to only read through those rows that have to do with the 
 query.
 It's a small change but wanted to put it in Jira so that it didn't fall 
 through the cracks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-2879) Make SSTableWriter.append(...) methods seekless.

2011-07-10 Thread Pavel Yaskevich (JIRA)

Make SSTableWriter.append(...) methods seekless.


 Key: CASSANDRA-2879
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2879
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
 Fix For: 1.0


as we already have a CF.serializedSize() method we don't need to reserve a 
place to store data size when we write data to SSTable. Compaction should be 
seekless too because we calculate data size before we write actual content.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2771) Remove commitlog_rotation_threshold_in_mb

2011-07-10 Thread Kirk True (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062850#comment-13062850
 ] 

Kirk True commented on CASSANDRA-2771:
--

There's still a reference in examples/client_only/conf/cassandra.yaml

  Remove commitlog_rotation_threshold_in_mb
 --

 Key: CASSANDRA-2771
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2771
 Project: Cassandra
  Issue Type: Improvement
Reporter: Patricio Echague
Assignee: Patricio Echague
Priority: Minor
  Labels: commitlog
 Fix For: 1.0

 Attachments: CASSANDRA-2771-2-trunk.txt, CASSANDRA-2771-3-trunk.txt


 Remove the commitlog segment size config setting, nobody has ever changed it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-47) SSTable compression

[jira] [Commented] (CASSANDRA-47) SSTable compression

[jira] [Created] (CASSANDRA-2877) git-cassandra-angosso-angosso.html

[jira] [Resolved] (CASSANDRA-2877) git-cassandra-angosso-angosso.html

[jira] [Updated] (CASSANDRA-2877) git-cassandra-angosso-angosso.html

[jira] [Commented] (CASSANDRA-47) SSTable compression

[jira] [Commented] (CASSANDRA-47) SSTable compression

[jira] [Commented] (CASSANDRA-47) SSTable compression

[jira] [Commented] (CASSANDRA-47) SSTable compression

[jira] [Commented] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query

[jira] [Updated] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query (using a KeyRange)

[jira] [Created] (CASSANDRA-2878) Filter out ColumnFamily rows that aren't part of the query (using a IndexClause)

[jira] [Commented] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query (using a KeyRange)

[jira] [Created] (CASSANDRA-2879) Make SSTableWriter.append(...) methods seekless.

[jira] [Commented] (CASSANDRA-2771) Remove commitlog_rotation_threshold_in_mb

15 matches

Site Navigation

Mail list logo

Footer information