date:20110905

[jira] [Commented] (CASSANDRA-3124) java heap limit for nodetool

2011-09-05 Thread Zenek Kraweznik (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097034#comment-13097034
 ] 

Zenek Kraweznik commented on CASSANDRA-3124:


Oh, and 1 important thing: I haven't change any default java limit in java 
config, I've modyfied only cassandra-env.sh

 java heap limit for nodetool
 

 Key: CASSANDRA-3124
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3124
 Project: Cassandra
  Issue Type: Improvement
  Components: Core, Tools
Affects Versions: 0.8.1, 0.8.2, 0.8.3, 0.8.4
 Environment: not important
Reporter: Zenek Kraweznik
Priority: Minor

 by defaull (from debian package)
 # nodetool
 Error occurred during initialization of VM
 Could not reserve enough space for object heap
 Could not create the Java virtual machine.
 #
 and:
 --- /usr/bin/nodetool.old   2011-09-02 14:15:14.228152799 +0200
 +++ /usr/bin/nodetool   2011-09-02 14:14:28.745154552 +0200
 @@ -55,7 +55,7 @@
  ;;
  esac
 -$JAVA -cp $CLASSPATH -Dstorage-config=$CASSANDRA_CONF \
 +$JAVA -Xmx32m -cp $CLASSPATH -Dstorage-config=$CASSANDRA_CONF \
  -Dlog4j.configuration=log4j-tools.properties \
  org.apache.cassandra.tools.NodeCmd $@
 after every upgrade i had to add limit manually. I think it's good idea to 
 add it by default ;)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3133) nodetool netstats doesn't show streams during decommission

2011-09-05 Thread Zenek Kraweznik (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097036#comment-13097036
 ] 

Zenek Kraweznik commented on CASSANDRA-3133:


When I type nodetool -h $IP netstats or nodetool -h $IP streams I see only:

Nothing streaming to /10.117.199.232

I also think domission of node will not end.

# nodetool -h 10.10.10.11 ring
Address DC  RackStatus State   LoadOwns
Token
   
10.10.10.11 datacenter1 rack1   Up Normal  193.5 GB25.00%  
0  
10.10.10.12 datacenter1 rack1   Up Normal  252.07 GB   33.33%  
56713727820156410577229101238628035242
10.10.10.13 datacenter1 rack1   Up Normal  188.63 GB   33.33%  
113427455640312821154458202477256070485
10.10.10.14 datacenter1 rack1   Up Leaving 141.97 GB   8.33%   
127605887595351923798765477786913079296


4th host is in leaving state from 72h,.

# nodetool -h 10.10.10.14 decommission ; echo END

nodetool process is still runing, END was not printed yet.

8GB RAM is free (25%, servers are 32GB), cpu is not loaded, there is a lot of 
free disk space. Networ traffic is about 25Kbit/s, but links are 1Gbps.



 nodetool netstats doesn't show streams during decommission
 --

 Key: CASSANDRA-3133
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3133
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 0.8.4
 Environment: debian 6.0.2.1 (squeeze), java 1.6.26 (sun, non-free 
 packages).
Reporter: Zenek Kraweznik

 nodetool netstats is now showing transferred files from demonission

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-3136) Allow CFIF to keep going despite unavailable ranges

2011-09-05 Thread Mck SembWever (JIRA)

Allow CFIF to keep going despite unavailable ranges
---

 Key: CASSANDRA-3136
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3136
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Mck SembWever
Priority: Minor


From http://thread.gmane.org/gmane.comp.db.cassandra.user/18902

use-case-1
We use Cassandra as a storage for web-pages, we store the HTML, all
URLs that has the same HTML data and some computed data. We run Hadoop
MR jobs to compute lexical and thematical data for each page and for
exporting the data to a binary files for later use. URL gets to a
Cassandra on user request (a pageview) so if we delete an URL, it gets
back quickly if the page is active. Because of that and because there
is lots of data, we have the keyspace set to RF=1. We can drop the
whole keyspace and it will regenerate quickly and would contain only
fresh data, so we don't care about lossing a node.
/use-case-1

use-case-2
trying to extract a small random sample (like a pig SAMPLE) of data out of 
cassandra.
/use-case-2

use-case-3
searching for something or some-pattern and one hit
is enough. If you get the hit it's a positive result regardless if
ranges were ignored, if you don't and you *know* there was a range
ignored along the way you can re-run the job later. 
For example such a job could be run at regular intervals in the day until a hit 
was found.
/use-case-3

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3136) Allow CFIF to keep going despite unavailable ranges

2011-09-05 Thread Mck SembWever (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mck SembWever updated CASSANDRA-3136:
-

Description:
From http://thread.gmane.org/gmane.comp.db.cassandra.user/18902

use-case-1 from=Patrik Modesto
We use Cassandra as a storage for web-pages, we store the HTML, all
URLs that has the same HTML data and some computed data. We run Hadoop
MR jobs to compute lexical and thematical data for each page and for
exporting the data to a binary files for later use. URL gets to a
Cassandra on user request (a pageview) so if we delete an URL, it gets
back quickly if the page is active. Because of that and because there
is lots of data, we have the keyspace set to RF=1. We can drop the
whole keyspace and it will regenerate quickly and would contain only
fresh data, so we don't care about lossing a node.
/use-case-1

use-case-2
trying to extract a small random sample (like a pig SAMPLE) of data out of
cassandra.
/use-case-2

use-case-3
searching for something or some-pattern and one hit
is enough. If you get the hit it's a positive result regardless if
ranges were ignored, if you don't and you *know* there was a range
ignored along the way you can re-run the job later.
For example such a job could be run at regular intervals in the day until a hit
was found.
/use-case-3

was:
From http://thread.gmane.org/gmane.comp.db.cassandra.user/18902

use-case-1
We use Cassandra as a storage for web-pages, we store the HTML, all
URLs that has the same HTML data and some computed data. We run Hadoop
MR jobs to compute lexical and thematical data for each page and for
exporting the data to a binary files for later use. URL gets to a
Cassandra on user request (a pageview) so if we delete an URL, it gets
back quickly if the page is active. Because of that and because there
is lots of data, we have the keyspace set to RF=1. We can drop the
whole keyspace and it will regenerate quickly and would contain only
fresh data, so we don't care about lossing a node.
/use-case-1

use-case-2
trying to extract a small random sample (like a pig SAMPLE) of data out of
cassandra.
/use-case-2

Allow CFIF to keep going despite unavailable ranges
---

Key: CASSANDRA-3136
URL: https://issues.apache.org/jira/browse/CASSANDRA-3136
Project: Cassandra
Issue Type: Improvement
Components: Hadoop
Reporter: Mck SembWever
Priority: Minor

From http://thread.gmane.org/gmane.comp.db.cassandra.user/18902
use-case-1 from=Patrik Modesto
We use Cassandra as a storage for web-pages, we store the HTML, all
URLs that has the same HTML data and some computed data. We run Hadoop
MR jobs to compute lexical and thematical data for each page and for
exporting the data to a binary files for later use. URL gets to a
Cassandra on user request (a pageview) so if we delete an URL, it gets
back quickly if the page is active. Because of that and because there
is lots of data, we have the keyspace set to RF=1. We can drop the
whole keyspace and it will regenerate quickly and would contain only
fresh data, so we don't care about lossing a node.
/use-case-1
use-case-2
trying to extract a small random sample (like a pig SAMPLE) of data out of
cassandra.
/use-case-2
use-case-3
searching for something or some-pattern and one hit
is enough. If you get the hit it's a positive result regardless if
ranges were ignored, if you don't and you *know* there was a range
ignored along the way you can re-run the job later.
For example such a job could be run at regular intervals in the day until a
hit was found.
/use-case-3

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3031) Add 4 byte integer type

2011-09-05 Thread Eric Evans (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097045#comment-13097045
]

Eric Evans commented on CASSANDRA-3031:
---

{quote}
I think that you are worried too much about backward CQL compatibility.
Cassandra clusters are operated by responsible persons and It do not break
existing clusters. Fixing schema creating CQL scripts is trivial (if they
exists).
{quote}

It's not idle concern, it's a reaction.

{quote}
Lets say we change CQL int from int8 to int4. It will create new cluster with
unexpected schema, but application will get an exception on first insert, you
cant validate int8 in int4 field. Admin can fix schema via cassandra-cli or fix
CQL script. Its fail fast scenario.
{quote}

... and update any code that makes assumptions about the length returned, etc,
etc.

Cassandra's next release will be the coveted 1.0, what that means precisely
is up to debate, but I think everyone is in agreement that it communicates
We're All Grown Up. For me, that means we're past the point were we can
realistically suggest that people smoke test, then pick up any the broken
pieces.

Add 4 byte integer type
---

Key: CASSANDRA-3031
URL: https://issues.apache.org/jira/browse/CASSANDRA-3031
Project: Cassandra
Issue Type: Improvement
Components: Core
Affects Versions: 0.8.4
Environment: any
Reporter: Radim Kolar
Priority: Minor
Labels: hector, lhf
Fix For: 1.0

Attachments: apache-cassandra-0.8.4-SNAPSHOT.jar, src.diff, test.diff

Cassandra currently lacks support for 4byte fixed size integer data type.
Java API Hector and C libcassandra likes to serialize integers as 4 bytes in
network order. Problem is that you cant use cassandra-cli to manipulate
stored rows. Compatibility with other applications using api following
cassandra integer encoding standard is problematic too.
Because adding new datatype/validator is fairly simple I recommend to add
int4 data type. Compatibility with hector is important because it is most
used Java cassandra api and lot of applications are using it.
This problem was discussed several times already
http://comments.gmane.org/gmane.comp.db.hector.user/2125
https://issues.apache.org/jira/browse/CASSANDRA-2585
It would be nice to have compatibility with cassandra-cli and other
applications without rewriting hector apps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-3031) Add 4 byte integer type

2011-09-05 Thread Eric Evans (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097045#comment-13097045
]

Eric Evans edited comment on CASSANDRA-3031 at 9/5/11 8:17 AM:
---

It's not idle concern, it's a reaction.

was (Author: urandom):
{quote}
I think that you are worried too much about backward CQL compatibility.
Cassandra clusters are operated by responsible persons and It do not break
existing clusters. Fixing schema creating CQL scripts is trivial (if they
exists).
{quote}

It's not idle concern, it's a reaction.

... and update any code that makes assumptions about the length returned, etc,
etc.

Add 4 byte integer type
---

Attachments: apache-cassandra-0.8.4-SNAPSHOT.jar, src.diff, test.diff

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3122) SSTableSimpleUnsortedWriter take long time when inserting big rows

2011-09-05 Thread Sylvain Lebresne (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097066#comment-13097066
]

Sylvain Lebresne commented on CASSANDRA-3122:
-

bq. every time newRow is called, serializedSize iterate through all the columns
to compute the size

Yes and I agree this ain't the more efficient thing ever, though I would kind
of be surprised this would be a bottleneck. Anyway, I don't oppose improving
this, but we should create a new ticket for that.

bq. An improvement in bulk loading would be to use a single threaded
ColumFamily for bulk loading.

Yes, but we'll do it in 1.0 only because we have CASSANDRA-2843 there that
basically make this trivial, while this is uglier to do without it.

SSTableSimpleUnsortedWriter take long time when inserting big rows
--

Key: CASSANDRA-3122
URL: https://issues.apache.org/jira/browse/CASSANDRA-3122
Project: Cassandra
Issue Type: Improvement
Components: Core
Affects Versions: 0.8.3
Reporter: Benoit Perroud
Assignee: Sylvain Lebresne
Priority: Minor
Fix For: 0.8.5

Attachments: 3122.patch, SSTableSimpleUnsortedWriter-v2.patch,
SSTableSimpleUnsortedWriter.patch

In SSTableSimpleUnsortedWriter, when dealing with rows having a lot of
columns, if we call newRow several times (to flush data as soon as possible),
the time taken by the newRow() call is increasing non linearly. This is
because when newRow is called, we merge the size increasing existing CF with
the new one.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3118) nodetool can not decommission a node

2011-09-05 Thread Zenek Kraweznik (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097065#comment-13097065
 ] 

Zenek Kraweznik commented on CASSANDRA-3118:


I also can't decommission node in 0.8.4. Communication between nodes are fine, 
cpu and ram utilization is OK (i have a lot of free resources).

my nodes are from 1 to 4, i want to disable node 4 (10.10.10.14)

Node no 4 is still in state leaving, but all transfer seems to be finished.

 nodetool  can not  decommission a node
 --

 Key: CASSANDRA-3118
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3118
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 0.8.4
 Environment: Cassandra0.84
Reporter: deng
 Fix For: 0.8.5

 Attachments: 3118-debug.txt


 when i use nodetool ring and get the result ,and than i want to decommission 
 100.86.17.90  node ,but i get the error:
 [root@ip bin]# ./nodetool -h10.86.12.225 ring
 Address DC  RackStatus State   LoadOwns   
  Token   
   
  154562542458917734942660802527609328132 
 100.86.17.90  datacenter1 rack1   Up Leaving 1.08 MB 
 11.21%  3493450320433654773610109291263389161   
 100.86.12.225datacenter1 rack1   Up Normal  558.25 MB   
 14.25%  27742979166206700793970535921354744095  
 100.86.12.224datacenter1 rack1   Up Normal  5.01 GB 6.58% 
   38945137636148605752956920077679425910  
 ERROR:
 root@ip bin]# ./nodetool -h100.86.17.90 decommission
 Exception in thread main java.lang.UnsupportedOperationException
 at java.util.AbstractList.remove(AbstractList.java:144)
 at java.util.AbstractList$Itr.remove(AbstractList.java:360)
 at java.util.AbstractCollection.removeAll(AbstractCollection.java:337)
 at 
 org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:1041)
 at 
 org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:1006)
 at 
 org.apache.cassandra.service.StorageService.handleStateLeaving(StorageService.java:877)
 at 
 org.apache.cassandra.service.StorageService.onChange(StorageService.java:732)
 at 
 org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:839)
 at 
 org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:986)
 at 
 org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1836)
 at 
 org.apache.cassandra.service.StorageService.decommission(StorageService.java:1855)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
 at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
 at 
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
 at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
 at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
 at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
 at 
 com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
 at 
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1426)
 at 
 javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
 at 
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1264)
 at 
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1359)
 at 
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
 at sun.rmi.transport.Transport$1.run(Transport.java:159)
 at java.security.AccessController.doPrivileged(Native Method)
 at

svn commit: r1165218 - in /cassandra/branches/cassandra-0.8/test/unit/org/apache/cassandra/io/sstable: SSTableSimpleWriterTest.java SSTableWriterTest.java

2011-09-05 Thread slebresne

Author: slebresne
Date: Mon Sep  5 09:23:54 2011
New Revision: 1165218

URL: http://svn.apache.org/viewvc?rev=1165218view=rev
Log:
Add missing file and fix type from CASSANDRA-3122 commit

Added:

cassandra/branches/cassandra-0.8/test/unit/org/apache/cassandra/io/sstable/SSTableSimpleWriterTest.java
Modified:

cassandra/branches/cassandra-0.8/test/unit/org/apache/cassandra/io/sstable/SSTableWriterTest.java

Added: 
cassandra/branches/cassandra-0.8/test/unit/org/apache/cassandra/io/sstable/SSTableSimpleWriterTest.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/test/unit/org/apache/cassandra/io/sstable/SSTableSimpleWriterTest.java?rev=1165218view=auto
==
--- 
cassandra/branches/cassandra-0.8/test/unit/org/apache/cassandra/io/sstable/SSTableSimpleWriterTest.java
 (added)
+++ 
cassandra/branches/cassandra-0.8/test/unit/org/apache/cassandra/io/sstable/SSTableSimpleWriterTest.java
 Mon Sep  5 09:23:54 2011
@@ -0,0 +1,104 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.cassandra.io.sstable;
+
+import java.io.File;
+
+import org.junit.Test;
+
+import org.apache.cassandra.CleanupHelper;
+import org.apache.cassandra.Util;
+import org.apache.cassandra.db.*;
+import org.apache.cassandra.db.marshal.IntegerType;
+import static org.apache.cassandra.utils.ByteBufferUtil.bytes;
+import static org.apache.cassandra.utils.ByteBufferUtil.toInt;
+
+public class SSTableSimpleWriterTest extends CleanupHelper
+{
+@Test
+public void testSSTableSimpleUnsortedWriter() throws Exception
+{
+final int INC = 5;
+final int NBCOL = 10;
+
+
+String tablename = Keyspace1;
+String cfname = StandardInteger1;
+
+Table t = Table.open(tablename); // make sure we create the directory
+File dir = new File(t.getDataFileLocation(0));
+assert dir.exists();
+
+SSTableSimpleUnsortedWriter writer = new 
SSTableSimpleUnsortedWriter(dir, tablename, cfname, IntegerType.instance, null, 
16);
+
+int k = 0;
+
+// Adding a few rows first
+for (; k  10; ++k)
+{
+writer.newRow(bytes(Key + k));
+writer.addColumn(bytes(1), bytes(v), 0);
+writer.addColumn(bytes(2), bytes(v), 0);
+writer.addColumn(bytes(3), bytes(v), 0);
+}
+
+
+// Testing multiple opening of the same row
+// We'll write column 0, 5, 10, .., on the first row, then 1, 6, 11, 
... on the second one, etc.
+for (int i = 0; i  INC; ++i)
+{
+writer.newRow(bytes(Key + k));
+for (int j = 0; j  NBCOL; ++j)
+{
+writer.addColumn(bytes(i + INC * j), bytes(v), 1);
+}
+}
+k++;
+
+// Adding a few more rows
+for (; k  20; ++k)
+{
+writer.newRow(bytes(Key + k));
+writer.addColumn(bytes(1), bytes(v), 0);
+writer.addColumn(bytes(2), bytes(v), 0);
+writer.addColumn(bytes(3), bytes(v), 0);
+}
+
+writer.close();
+
+// Now add that newly created files to the column family
+ColumnFamilyStore cfs = t.getColumnFamilyStore(cfname);
+cfs.loadNewSSTables();
+
+// Check we get expected results
+ColumnFamily cf = Util.getColumnFamily(t, Util.dk(Key10), cfname);
+assert cf.getColumnCount() == INC * NBCOL : expecting  + (INC * 
NBCOL) +  columns, got  + cf.getColumnCount();
+int i = 0;
+for (IColumn c : cf)
+{
+assert toInt(c.name()) == i : Column name should be  + i + , 
got  + toInt(c.name());
+assert c.value().equals(bytes(v));
+assert c.timestamp() == 1;
+++i;
+}
+
+cf = Util.getColumnFamily(t, Util.dk(Key19), cfname);
+assert cf.getColumnCount() == 3 : expecting 3 columns, got  + 
cf.getColumnCount();
+}
+}

Modified: 
cassandra/branches/cassandra-0.8/test/unit/org/apache/cassandra/io/sstable/SSTableWriterTest.java
URL:

[jira] [Commented] (CASSANDRA-3122) SSTableSimpleUnsortedWriter take long time when inserting big rows

2011-09-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097112#comment-13097112
 ] 

Hudson commented on CASSANDRA-3122:
---

Integrated in Cassandra-0.8 #313 (See 
[https://builds.apache.org/job/Cassandra-0.8/313/])
Add missing file and fix type from CASSANDRA-3122 commit

slebresne : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1165218
Files : 
* 
/cassandra/branches/cassandra-0.8/test/unit/org/apache/cassandra/io/sstable/SSTableSimpleWriterTest.java
* 
/cassandra/branches/cassandra-0.8/test/unit/org/apache/cassandra/io/sstable/SSTableWriterTest.java


 SSTableSimpleUnsortedWriter take long time when inserting big rows
 --

 Key: CASSANDRA-3122
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3122
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.3
Reporter: Benoit Perroud
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 0.8.5

 Attachments: 3122.patch, SSTableSimpleUnsortedWriter-v2.patch, 
 SSTableSimpleUnsortedWriter.patch


 In SSTableSimpleUnsortedWriter, when dealing with rows having a lot of 
 columns, if we call newRow several times (to flush data as soon as possible), 
 the time taken by the newRow() call is increasing non linearly. This is 
 because when newRow is called, we merge the size increasing existing CF with 
 the new one.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-3137) Implement wrapping intersections for ConfigHelper's InputKeyRange

2011-09-05 Thread Mck SembWever (JIRA)

Implement wrapping intersections for ConfigHelper's InputKeyRange
-

 Key: CASSANDRA-3137
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3137
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Affects Versions: 0.8.4
Reporter: Mck SembWever
Assignee: Mck SembWever


Before there was no support for multiple intersections between the split's 
range and the job's configured range.
After CASSANDRA-3108 it is now possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3108) Make Range and Bounds objects client-safe

2011-09-05 Thread Mck SembWever (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097126#comment-13097126
 ] 

Mck SembWever commented on CASSANDRA-3108:
--

Didn't see it until now but your patch Jonathan removes the limitation that 
ConfigHelper's InputKeyRange cannot wrap.
I've entered CASSANDRA-3137 to allow wrapping intersections in 
{{ColumnFamilyInputFormat}}.

 Make Range and Bounds objects client-safe
 -

 Key: CASSANDRA-3108
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3108
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.2
Reporter: Jonathan Ellis
Assignee: Mck SembWever
  Labels: hadoop
 Fix For: 0.8.5

 Attachments: 3108.txt


 From Mck's comment on CASSANDRA-1125:
 Something broke here in production once we went out with 0.8.2. It may have 
 been some poor testing, i'm not entirely sure and a little surprised.
 CFIF:135 breaks because inside dhtRange.intersects(jobRange) there's a call 
 to new Range(token, token) which calls StorageService.getPartitioner() and 
 StorageService is null as we're not inside the server.
 A quick fix is to change Range:148 from new Range(token, token) to new 
 Range(token, token, partitioner) making the presumption that the partitioner 
 for the new Range will be the same as this Range.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3137) Implement wrapping intersections for ConfigHelper's InputKeyRange

2011-09-05 Thread Mck SembWever (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-3137:
-

Affects Version/s: (was: 0.8.4)
   0.8.5

 Implement wrapping intersections for ConfigHelper's InputKeyRange
 -

 Key: CASSANDRA-3137
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3137
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Affects Versions: 0.8.5
Reporter: Mck SembWever
Assignee: Mck SembWever

 Before there was no support for multiple intersections between the split's 
 range and the job's configured range.
 After CASSANDRA-3108 it is now possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3137) Implement wrapping intersections for ConfigHelper's InputKeyRange

2011-09-05 Thread Mck SembWever (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-3137:
-

Attachment: CASSANDRA-3137.patch

Haven't tested this (with real data) yet.

But the code looks pretty simple and straight forward here...

 Implement wrapping intersections for ConfigHelper's InputKeyRange
 -

 Key: CASSANDRA-3137
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3137
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Affects Versions: 0.8.5
Reporter: Mck SembWever
Assignee: Mck SembWever
 Attachments: CASSANDRA-3137.patch


 Before there was no support for multiple intersections between the split's 
 range and the job's configured range.
 After CASSANDRA-3108 it is now possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2474) CQL support for compound columns

2011-09-05 Thread Sylvain Lebresne (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097159#comment-13097159
]

Sylvain Lebresne commented on CASSANDRA-2474:
-

I do agree with Eric earlier on, I think this issue could stand being
summarized, I'm not too sure I understand what is proposed here so far. So I
apologize in advance if it turns out the propositions made above do answer
everything that is below.

However, it seems that we're focusing on some representation based on
materialized views here. Did we focus on that because we consider the basic use
cases for composite type, those where we don't use them for materialized view
at all, are easy to deal with ?

Why not consider composite column name for what they are, *one* column name
that is composed of multiple sub-elements ? What I mean here is, I'm not that
sure I'm convinced that
bq. the original idea from CASSANDRA-2025 of SELECT columnA:x, columnA:y FROM
foo WHERE key = 'bar' is the wrong way to go

I'm even less convinced when I see the number of comments on this ticket.

Again, there seems that the focus was exclusively on materialized views, but I
strongly think that composite column names are useful for more than
materialized view (I've used composite column names countless time, never for
materialized view).

But let's take an example of what I mean. Suppose that what you store in your
column family are events. Those events arrive with a timestamp whose resolution
is maybe the minute (or more precisely, you only care about query them at that
precision). Those events have a category (that may have a sorting that make
sense), and maybe a subcategory. They also have a unique identifier eventId.
Moreover there is a lot of events every minutes and the category/subcategory
are not necessarily predefined. The query you want to do are typically:
* Give me all the events for time t, category c and sub-category sc.
* Give me all the events for time t and category c.
* Give me all the events for time t and category c1 to c2 (where c1 c2 for
the category sorting)
* Give me everything for the last 4 hours
Probably most of those would requires paging because there is shit tons of
events but still, I want to do those fast.

I haven't found a better data model for that kind of example than using a
composite column name where the name is (timestamp, category, sub-category,
eventId).

I haven't found in all the discussion above anything that would allow me to do
this better than what is in the initial proposition of CASSANDRA-2025.

Now I completely agree that having a good notation to work with materialized
view would be great, but IMO if we try to find a syntax that is too far from
how composite columns work, I fear we'll end up limiting the usefulness of
composite types in CQL to one narrow use case.

I'll note too that I haven't seen any proposal of how insertion with compound
types should look like.

CQL support for compound columns

Key: CASSANDRA-2474
URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
Project: Cassandra
Issue Type: Sub-task
Components: API, Core
Reporter: Eric Evans
Assignee: Pavel Yaskevich
Labels: cql
Fix For: 1.0

Attachments: screenshot-1.jpg, screenshot-2.jpg

For the most part, this boils down to supporting the specification of
compound column names (the CQL syntax is colon-delimted terms), and then
teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

svn commit: r1165306 - in /cassandra/trunk: ./ src/java/org/apache/cassandra/db/ src/java/org/apache/cassandra/db/context/ src/java/org/apache/cassandra/io/sstable/ src/java/org/apache/cassandra/strea

2011-09-05 Thread slebresne

Author: slebresne
Date: Mon Sep  5 14:55:28 2011
New Revision: 1165306

URL: http://svn.apache.org/viewvc?rev=1165306view=rev
Log:
Handle large rows with single-pass streaming
patch by yukim; reviewed by slebresne for CASSANDRA-3003

Modified:
cassandra/trunk/CHANGES.txt
cassandra/trunk/src/java/org/apache/cassandra/db/CounterColumn.java
cassandra/trunk/src/java/org/apache/cassandra/db/context/CounterContext.java
cassandra/trunk/src/java/org/apache/cassandra/io/sstable/SSTableWriter.java

cassandra/trunk/src/java/org/apache/cassandra/streaming/IncomingStreamReader.java

Modified: cassandra/trunk/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/trunk/CHANGES.txt?rev=1165306r1=1165305r2=1165306view=diff
==
--- cassandra/trunk/CHANGES.txt (original)
+++ cassandra/trunk/CHANGES.txt Mon Sep  5 14:55:28 2011
@@ -12,7 +12,7 @@
  * don't bother persisting columns shadowed by a row tombstone (CASSANDRA-2589)
  * reset CF and SC deletion times after gc_grace (CASSANDRA-2317)
  * optimize away seek when compacting wide rows (CASSANDRA-2879)
- * single-pass streaming (CASSANDRA-2677)
+ * single-pass streaming (CASSANDRA-2677, 3003)
  * use reference counting for deleting sstables instead of relying on GC
(CASSANDRA-2521)
  * store hints as serialized mutations instead of pointers to data row

Modified: cassandra/trunk/src/java/org/apache/cassandra/db/CounterColumn.java
URL: 
http://svn.apache.org/viewvc/cassandra/trunk/src/java/org/apache/cassandra/db/CounterColumn.java?rev=1165306r1=1165305r2=1165306view=diff
==
--- cassandra/trunk/src/java/org/apache/cassandra/db/CounterColumn.java 
(original)
+++ cassandra/trunk/src/java/org/apache/cassandra/db/CounterColumn.java Mon Sep 
 5 14:55:28 2011
@@ -21,7 +21,6 @@ package org.apache.cassandra.db;
 import java.io.IOException;
 import java.nio.ByteBuffer;
 import java.security.MessageDigest;
-import java.util.Map;
 
 import org.apache.log4j.Logger;
 
@@ -70,7 +69,9 @@ public class CounterColumn extends Colum
 
 public static CounterColumn create(ByteBuffer name, ByteBuffer value, long 
timestamp, long timestampOfLastDelete, boolean fromRemote)
 {
-if (fromRemote)
+// #elt being negative means we have to clean delta
+short count = value.getShort(value.position());
+if (fromRemote || count  0)
 value = CounterContext.instance().clearAllDelta(value);
 return new CounterColumn(name, value, timestamp, 
timestampOfLastDelete);
 }
@@ -285,4 +286,8 @@ public class CounterColumn extends Colum
 }
 }
 
+public IColumn markDeltaToBeCleared()
+{
+return new CounterColumn(name, 
contextManager.markDeltaToBeCleared(value), timestamp, timestampOfLastDelete);
+}
 }

Modified: 
cassandra/trunk/src/java/org/apache/cassandra/db/context/CounterContext.java
URL: 
http://svn.apache.org/viewvc/cassandra/trunk/src/java/org/apache/cassandra/db/context/CounterContext.java?rev=1165306r1=1165305r2=1165306view=diff
==
--- 
cassandra/trunk/src/java/org/apache/cassandra/db/context/CounterContext.java 
(original)
+++ 
cassandra/trunk/src/java/org/apache/cassandra/db/context/CounterContext.java 
Mon Sep  5 14:55:28 2011
@@ -130,7 +130,7 @@ public class CounterContext implements I
 
 private static int headerLength(ByteBuffer context)
 {
-return HEADER_SIZE_LENGTH + context.getShort(context.position()) * 
HEADER_ELT_LENGTH;
+return HEADER_SIZE_LENGTH + 
Math.abs(context.getShort(context.position())) * HEADER_ELT_LENGTH;
 }
 
 private static int compareId(ByteBuffer bb1, int pos1, ByteBuffer bb2, int 
pos2)
@@ -442,6 +442,28 @@ public class CounterContext implements I
 }
 
 /**
+ * Mark context to delete delta afterward.
+ * Marking is done by multiply #elt by -1 to preserve header length
+ * and #elt count in order to clear all delta later.
+ *
+ * @param context a counter context
+ * @return context that marked to delete delta
+ */
+public ByteBuffer markDeltaToBeCleared(ByteBuffer context)
+{
+int headerLength = headerLength(context);
+if (headerLength == 0)
+return context;
+
+ByteBuffer marked = context.duplicate();
+short count = context.getShort(context.position());
+// negate #elt to mark as deleted, without changing its size.
+if (count  0)
+marked.putShort(marked.position(), (short) (count * -1));
+return marked;
+}
+
+/**
  * Remove all the delta of a context (i.e, set an empty header).
  *
  * @param context a counter context

Modified: 
cassandra/trunk/src/java/org/apache/cassandra/io/sstable/SSTableWriter.java
URL:

[jira] [Resolved] (CASSANDRA-3003) Trunk single-pass streaming doesn't handle large row correctly

2011-09-05 Thread Sylvain Lebresne (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sylvain Lebresne resolved CASSANDRA-3003.
-

Resolution: Fixed

Trunk single-pass streaming doesn't handle large row correctly
--

Key: CASSANDRA-3003
URL: https://issues.apache.org/jira/browse/CASSANDRA-3003
Project: Cassandra
Issue Type: Bug
Components: Core
Affects Versions: 1.0
Reporter: Sylvain Lebresne
Assignee: Yuki Morishita
Priority: Critical
Labels: streaming
Fix For: 1.0

Attachments: 3003-v1.txt, 3003-v2.txt, 3003-v3.txt, 3003-v5.txt,
v3003-v4.txt

For normal column family, trunk streaming always buffer the whole row into
memory. In uses
{noformat}
ColumnFamily.serializer().deserializeColumns(in, cf, true, true);
{noformat}
on the input bytes.
We must avoid this for rows that don't fit in the inMemoryLimit.
Note that for regular column families, for a given row, there is actually no
need to even recreate the bloom filter of column index, nor to deserialize
the columns. It is enough to filter the key and row size to feed the index
writer, but then simply dump the rest on disk directly. This would make
streaming more efficient, avoid a lot of object creation and avoid the
pitfall of big rows.
Counters column family are unfortunately trickier, because each column needs
to be deserialized (to mark them as 'fromRemote'). However, we don't need to
do the double pass of LazilyCompactedRow for that. We can simply use a
SSTableIdentityIterator and deserialize/reserialize input as it comes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3003) Trunk single-pass streaming doesn't handle large row correctly

2011-09-05 Thread Sylvain Lebresne (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097165#comment-13097165
]

Sylvain Lebresne commented on CASSANDRA-3003:
-

lgtm, +1.

Committed with a tiny change to use a cheaper array backed column family in
appendToStream, since we deserialize in order (and in a single thread).

Trunk single-pass streaming doesn't handle large row correctly
--

Attachments: 3003-v1.txt, 3003-v2.txt, 3003-v3.txt, 3003-v5.txt,
v3003-v4.txt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2474) CQL support for compound columns

2011-09-05 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097166#comment-13097166
 ] 

Pavel Yaskevich commented on CASSANDRA-2474:


if we consider that timestamp is key and event_id, category and subcategory is 
composite name then:

bq. Give me all the events for time t, category c and sub-category sc

{noformat}
SELECT name AS (event_id, category, subcategory), value AS event FROM events 
WHERE key = timestamp AND category = name AND subcategory = name;
{noformat}

bq. Give me all the events for time t and category c

{noformat}
SELECT name AS (event_id, category, *), value AS event FROM events WHERE key = 
timestamp AND category = name;
{noformat}

bq. Give me all the events for time t and category c1 to c2 (where c1  c2 for 
the category sorting)

{noformat}
SELECT name AS (event_id, category, *), value AS event FROM events WHERE key = 
timestamp AND category  c1 AND category  c2;
{noformat}

bq. Give me everything for the last 4 hours

{noformat}
SELECT name AS (event_id, category, *), value AS event FROM events WHERE key  
timestamp AND key  timestamp;
{noformat}

 CQL support for compound columns
 

 Key: CASSANDRA-2474
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
 Project: Cassandra
  Issue Type: Sub-task
  Components: API, Core
Reporter: Eric Evans
Assignee: Pavel Yaskevich
  Labels: cql
 Fix For: 1.0

 Attachments: screenshot-1.jpg, screenshot-2.jpg


 For the most part, this boils down to supporting the specification of 
 compound column names (the CQL syntax is colon-delimted terms), and then 
 teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3003) Trunk single-pass streaming doesn't handle large row correctly

2011-09-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097174#comment-13097174
 ] 

Hudson commented on CASSANDRA-3003:
---

Integrated in Cassandra #1074 (See 
[https://builds.apache.org/job/Cassandra/1074/])
Handle large rows with single-pass streaming
patch by yukim; reviewed by slebresne for CASSANDRA-3003

slebresne : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1165306
Files : 
* /cassandra/trunk/CHANGES.txt
* /cassandra/trunk/src/java/org/apache/cassandra/db/CounterColumn.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/context/CounterContext.java
* /cassandra/trunk/src/java/org/apache/cassandra/io/sstable/SSTableWriter.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/streaming/IncomingStreamReader.java


 Trunk single-pass streaming doesn't handle large row correctly
 --

 Key: CASSANDRA-3003
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3003
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0
Reporter: Sylvain Lebresne
Assignee: Yuki Morishita
Priority: Critical
  Labels: streaming
 Fix For: 1.0

 Attachments: 3003-v1.txt, 3003-v2.txt, 3003-v3.txt, 3003-v5.txt, 
 v3003-v4.txt


 For normal column family, trunk streaming always buffer the whole row into 
 memory. In uses
 {noformat}
   ColumnFamily.serializer().deserializeColumns(in, cf, true, true);
 {noformat}
 on the input bytes.
 We must avoid this for rows that don't fit in the inMemoryLimit.
 Note that for regular column families, for a given row, there is actually no 
 need to even recreate the bloom filter of column index, nor to deserialize 
 the columns. It is enough to filter the key and row size to feed the index 
 writer, but then simply dump the rest on disk directly. This would make 
 streaming more efficient, avoid a lot of object creation and avoid the 
 pitfall of big rows.
 Counters column family are unfortunately trickier, because each column needs 
 to be deserialized (to mark them as 'fromRemote'). However, we don't need to 
 do the double pass of LazilyCompactedRow for that. We can simply use a 
 SSTableIdentityIterator and deserialize/reserialize input as it comes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3091) Move the caching of KS and CF metadata in the JDBC suite from Connection to Statement

2011-09-05 Thread Rick Shaw (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097178#comment-13097178
 ] 

Rick Shaw commented on CASSANDRA-3091:
--

Let's close this in deference to CASSANDRA-2734, and get that into trunk.

 Move the caching of KS and CF metadata in the JDBC suite from Connection to 
 Statement
 -

 Key: CASSANDRA-3091
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3091
 Project: Cassandra
  Issue Type: Improvement
  Components: Drivers
Affects Versions: 0.8.4
Reporter: Rick Shaw
Assignee: Rick Shaw
Priority: Minor
  Labels: JDBC
 Fix For: 0.8.6

 Attachments: move-metadata-for decoder-to-statement-level-v1.txt, 
 move-metadata-for-decoder-to-statement-level-v2.txt


 Currently, all caching of metadata used in JDBC's {{ColumnDecoder}} class is 
 loaded and held in the {{CassandraConnection}} class. The implication of this 
 is that any activity on the connected server from the time the connection is 
 established is not reflected in the KSs and CF that can be accessed by the 
 {{ResultSet, Statement}} and {{PreparedStatement}}.
 By moving the cached metadata to the {{Statement}} level, the currency of the 
 metadata can be checked within the {{Statement}} and reloaded if it is seen 
 to be absent. And by instantiating a new {{Statement}} (on any existing 
 connection) you are assured of getting the most current copy of the metadata 
 known to the server at the new time of instantiation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2474) CQL support for compound columns

2011-09-05 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097184#comment-13097184
 ] 

Sylvain Lebresne commented on CASSANDRA-2474:
-

Well, the timestamp was not meant to be the key in my example and the event_id 
needs to be the last component for this to make sense (since it is not 
specified in the query) but ok..

Now, I don't understand how:
{noformat}
SELECT name AS (category, subcategory, *), value AS event FROM events WHERE key 
= timestamp AND category = category AND subcategory = subcat;
{noformat}
is fundamentally different from
{noformat}
SELECT category:subcat:event_id, value FROM events WHERE key = 
timestamp;
{noformat}
which is roughly the proposition from CASSANDRA-2025. And I mean fundamentally 
different, not just from a syntax point of view (I have nothing against using 
parenthesis). If if it just a syntax difference, then fine.

Or how
{noformat}
SELECT name AS (category, *), value AS event FROM events WHERE key = 
timestamp AND category  c1 AND category  c2;
{noformat}
is fundamentally different from
{noformat}
SELECT c1:*..c2:*, value FROM events WHERE key = timestamp;
{noformat}

Maybe giving an example of what is supposed to be the returned would start to 
show the differences, but so far it seems only a difference of syntax.  And the 
discussions above suggests that there is more than that underneath.


 CQL support for compound columns
 

 Key: CASSANDRA-2474
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
 Project: Cassandra
  Issue Type: Sub-task
  Components: API, Core
Reporter: Eric Evans
Assignee: Pavel Yaskevich
  Labels: cql
 Fix For: 1.0

 Attachments: screenshot-1.jpg, screenshot-2.jpg


 For the most part, this boils down to supporting the specification of 
 compound column names (the CQL syntax is colon-delimted terms), and then 
 teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

2011-09-05 Thread paul cannon (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097189#comment-13097189
]

paul cannon commented on CASSANDRA-2434:

bq. Ok, so if we always prefer to bootstrap from the correct token, then I
still think we should combine getRangesWithStrictSource and
getRangesWithSources. Basically the logic should be, find the 'best' node to
stream from. If the user requested it, also find a list of other candidates and
order them by proximity. Right?

I don't think so. I would still want to leave the option to stream from the
closest even if the strict best node is available.

node bootstrapping can violate consistency
--

Key: CASSANDRA-2434
URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
Project: Cassandra
Issue Type: Bug
Reporter: Peter Schuller
Assignee: paul cannon
Fix For: 1.1

Attachments: 2434.patch.txt

My reading (a while ago) of the code indicates that there is no logic
involved during bootstrapping that avoids consistency level violations. If I
recall correctly it just grabs neighbors that are currently up.
There are at least two issues I have with this behavior:
* If I have a cluster where I have applications relying on QUORUM with RF=3,
and bootstrapping complete based on only one node, I have just violated the
supposedly guaranteed consistency semantics of the cluster.
* Nodes can flap up and down at any time, so even if a human takes care to
look at which nodes are up and things about it carefully before
bootstrapping, there's no guarantee.
A complication is that not only does it depend on use-case where this is an
issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster
which is otherwise used for QUORUM operations you may wish to accept
less-than-quorum nodes during bootstrap in various emergency situations.
A potential easy fix is to have bootstrap take an argument which is the
number of hosts to bootstrap from, or to assume QUORUM if none is given.
(A related concern is bootstrapping across data centers. You may *want* to
bootstrap to a local node and then do a repair to avoid sending loads of data
across DC:s while still achieving consistency. Or even if you don't care
about the consistency issues, I don't think there is currently a way to
bootstrap from local nodes only.)
Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

2011-09-05 Thread paul cannon (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097190#comment-13097190
]

paul cannon commented on CASSANDRA-2434:

bq. I'm not sure I understand, are you saying that B would violate this, or
just that the status quo does?

I'm saying B would violate this, yes. B was bootstrap from the right token,
but if that one isn't up, bootstrap from any other token preferring the closer
ones, right? I'm saying we can't just automatically choose another token if
the user didn't specifically say it's ok.

node bootstrapping can violate consistency
--

Key: CASSANDRA-2434
URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
Project: Cassandra
Issue Type: Bug
Reporter: Peter Schuller
Assignee: paul cannon
Fix For: 1.1

Attachments: 2434.patch.txt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

2011-09-05 Thread Nick Bailey (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097204#comment-13097204
 ] 

Nick Bailey commented on CASSANDRA-2434:


Paul,

The suggestion was that if the 'correct' node is down, you can force the 
bootstrap to complete anyway (probably from the closest node, but that is 
transparent to the user), but only if the 'correct' node is down. It sounds 
like you agree with Jonathan on the more general approach though.

Zhu,

Repair doesn't help in the case when you lost data due to a node going down. 
Also if only one node is down you should still be able to read/write at quorum 
and achieve consistency (assuming your replication factor is greater than 2).

 node bootstrapping can violate consistency
 --

 Key: CASSANDRA-2434
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
 Project: Cassandra
  Issue Type: Bug
Reporter: Peter Schuller
Assignee: paul cannon
 Fix For: 1.1

 Attachments: 2434.patch.txt


 My reading (a while ago) of the code indicates that there is no logic 
 involved during bootstrapping that avoids consistency level violations. If I 
 recall correctly it just grabs neighbors that are currently up.
 There are at least two issues I have with this behavior:
 * If I have a cluster where I have applications relying on QUORUM with RF=3, 
 and bootstrapping complete based on only one node, I have just violated the 
 supposedly guaranteed consistency semantics of the cluster.
 * Nodes can flap up and down at any time, so even if a human takes care to 
 look at which nodes are up and things about it carefully before 
 bootstrapping, there's no guarantee.
 A complication is that not only does it depend on use-case where this is an 
 issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster 
 which is otherwise used for QUORUM operations you may wish to accept 
 less-than-quorum nodes during bootstrap in various emergency situations.
 A potential easy fix is to have bootstrap take an argument which is the 
 number of hosts to bootstrap from, or to assume QUORUM if none is given.
 (A related concern is bootstrapping across data centers. You may *want* to 
 bootstrap to a local node and then do a repair to avoid sending loads of data 
 across DC:s while still achieving consistency. Or even if you don't care 
 about the consistency issues, I don't think there is currently a way to 
 bootstrap from local nodes only.)
 Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3128) Replace compression and compression_options config parameters by just a compression_options map.

2011-09-05 Thread Sylvain Lebresne (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-3128:


Attachment: 0002-Use-only-one-options-for-compression.patch
0001-Thrift-files.patch

 Replace compression and compression_options config parameters by just a 
 compression_options map.
 

 Key: CASSANDRA-3128
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3128
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.0
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 1.0

 Attachments: 0001-Thrift-files.patch, 
 0002-Use-only-one-options-for-compression.patch


 As suggested on CASSANDRA-3105, as long as 1.0 is not out, we could replace 
 the 'compression' and 'compression_options' parameters by just one that would 
 allow to write:
 {noformat}
   compression_options = { sstable_compression: SnappyCompressor, 
 block_length_kb: 32 }
 {noformat}
 This would allow for more future-proof, in particular if we decide to make 
 CASSANDRA-3015 pluggable in the future or for CASSANDRA-3127 as this would 
 allow us to simply evolve to say:
 {noformat}
   compression_options = { sstable_compression: SnappyCompressor, 
 block_length_kb: 32, stream_compression: LZFCompressor }
 {noformat}
 This has the advantages of (1) not polluting CfDef and (2) leaving the option 
 of documenting some option only in advanced documentation (if said option is 
 not meant for new users)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2474) CQL support for compound columns

2011-09-05 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097225#comment-13097225
 ] 

Pavel Yaskevich commented on CASSANDRA-2474:


The core difference is that (..,..,..) notation will return given aliases 
(category, subcategory) as column names in the results.

 CQL support for compound columns
 

 Key: CASSANDRA-2474
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
 Project: Cassandra
  Issue Type: Sub-task
  Components: API, Core
Reporter: Eric Evans
Assignee: Pavel Yaskevich
  Labels: cql
 Fix For: 1.0

 Attachments: screenshot-1.jpg, screenshot-2.jpg


 For the most part, this boils down to supporting the specification of 
 compound column names (the CQL syntax is colon-delimted terms), and then 
 teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2474) CQL support for compound columns

2011-09-05 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097231#comment-13097231
 ] 

Sylvain Lebresne commented on CASSANDRA-2474:
-

bq. The core difference is that (..,..,..) notation will return given aliases 
(category, subcategory) as column names in the results

But how will it do that ?

The result of
{noformat}
SELECT c1:*..c2:*, value FROM events WHERE key = timestamp;
{noformat}
would be something like
{noformat}
Key | c1:subc1 | c1:subc2 | c1:subc3 | c2:subc1 |
timestamp | event_value1 | event_value2 | event_value3 | event_value4 |
{noformat}

How do the result look like with 'given aliases (category, subcategory) as 
column names in the results' ?


 CQL support for compound columns
 

 Key: CASSANDRA-2474
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
 Project: Cassandra
  Issue Type: Sub-task
  Components: API, Core
Reporter: Eric Evans
Assignee: Pavel Yaskevich
  Labels: cql
 Fix For: 1.0

 Attachments: screenshot-1.jpg, screenshot-2.jpg


 For the most part, this boils down to supporting the specification of 
 compound column names (the CQL syntax is colon-delimted terms), and then 
 teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

75 matches

Mail list logo