[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-06-02 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568708#comment-14568708
 ] 

Piotr Kołaczkowski commented on CASSANDRA-7688:
---

So what is this JMX call to refresh the estimates?
I tried setting {{-Dcassandra.size_recorder_interval=1}} (I know, ridiculously 
low value) and the size estimates table is still empty when running the tests.

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.5

 Attachments: 7688.txt


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-06-02 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568971#comment-14568971
 ] 

Piotr Kołaczkowski commented on CASSANDRA-7688:
---

I double checked the cassandra.size_recorder_interval option does indeed work, 
but because many tests are creating fresh keyspaces.tables, the interval of 1 
second is still too large and the test manages to run before the estimates are 
created. We'd need to put Thread.sleep before all the tests, which we're not 
going to do, because it would  So far we just removed the warning about missing 
estimates. But it would be nice if C* filled those estimate entries on table 
creation (even with zeroes). There is a difference between we don't know the 
estimates and we know there are no data. 

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.5

 Attachments: 7688.txt


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-04-03 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394179#comment-14394179
 ] 

Piotr Kołaczkowski commented on CASSANDRA-7688:
---

So I must have had some dump saved by some early development branch then. 
Thanks for the clarification.

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.5

 Attachments: 7688.txt


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-04-03 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394294#comment-14394294
 ] 

Piotr Kołaczkowski commented on CASSANDRA-7688:
---

Will there be a command to manually refresh statistics of a table from CQL 
(like ANALYZE TABLE ...)?
I need a way to trigger this in an integration test and I don't want to wait 
until it automatically refreshes it after the update interval...
1. create table
2. add data
3. analyze (?)
4. check stats


 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.5

 Attachments: 7688.txt


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-04-03 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394926#comment-14394926
 ] 

Aleksey Yeschenko commented on CASSANDRA-7688:
--

There most definitely won't be a separate CQL command just for that, but when 
we switch this to a virtual table implementation (when we have those) it might 
be as simple as {{UPDATE}}ing a boolean field in that table to trigger recalc.

We could temporarily add a JMX method. Or you could set the interval to be 
really low for now, and add some sleep.

I know it's a bit ugly, but it's just an interim measure.

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.5

 Attachments: 7688.txt


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-04-02 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393360#comment-14393360
 ] 

Aleksey Yeschenko commented on CASSANDRA-7688:
--

You probably just have schema left from running 2.1-head.

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.5

 Attachments: 7688.txt


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-04-02 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393367#comment-14393367
 ] 

Philip Thompson commented on CASSANDRA-7688:


I'm using ccm, so the data dirs are being created from scratch

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.5

 Attachments: 7688.txt


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-04-02 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393296#comment-14393296
 ] 

Piotr Kołaczkowski commented on CASSANDRA-7688:
---

Why is this ticket marked as fixed in 2.1.5, if I can see this working in just 
released 2.1.4?

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.5

 Attachments: 7688.txt


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-04-02 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393375#comment-14393375
 ] 

Aleksey Yeschenko commented on CASSANDRA-7688:
--

Must be magic then.

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.5

 Attachments: 7688.txt


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-04-02 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393352#comment-14393352
 ] 

Philip Thompson commented on CASSANDRA-7688:


I see the system.size_estimates table in 2.1.4, but I don't see it being 
populated. Are you?

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.5

 Attachments: 7688.txt


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-04-02 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393394#comment-14393394
 ] 

Aleksey Yeschenko commented on CASSANDRA-7688:
--

Remembered now. It was committed to 2.1.3, but population was disabled before 
the release. So the table is still there, it's just that there is no actual 
sizing dumps enabled until 2.1.5.

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.5

 Attachments: 7688.txt


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-02-09 Thread Michael Shuler (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312887#comment-14312887
 ] 

Michael Shuler commented on CASSANDRA-7688:
---

This caused a regression:
from: http://cassci.datastax.com/job/cassandra-2.1_novnode_dtest/470/testReport/
to: http://cassci.datastax.com/job/cassandra-2.1_novnode_dtest/472/testReport/ 

repro with the bootstrap_test.py dtest: vnodes vs no-vnodes:
{noformat}
(master)mshuler@hana:~/git/cassandra-dtest$ nosetests -vs bootstrap_test.py 
read_from_bootstrapped_node_test (bootstrap_test.TestBootstrap) ... Created 
keyspaces. Sleeping 1s for propagation.
Warming up WRITE with 5 iterations...
INFO  19:17:03 Using data-center name 'datacenter1' for DCAwareRoundRobinPolicy 
(if this is incorrect, please provide the correct datacenter name with 
DCAwareRoundRobinPolicy constructor)
INFO  19:17:03 New Cassandra host /127.0.0.2:9042 added
Connected to cluster: test
Datatacenter: datacenter1; Host: /127.0.0.1; Rack: rack1
INFO  19:17:03 New Cassandra host /127.0.0.3:9042 added
Datatacenter: datacenter1; Host: /127.0.0.3; Rack: rack1
Datatacenter: datacenter1; Host: /127.0.0.2; Rack: rack1
INFO  19:17:03 New Cassandra host /127.0.0.1:9042 added
Failed to connect over JMX; not collecting these stats
Sleeping 2s...
Running WRITE with 8 threads for 1 iteration
Failed to connect over JMX; not collecting these stats
total ops , adj row/s,op/s,pk/s,   row/s,mean, med, .95,
 .99,.999, max,   time,   stderr,  gc: #,  max ms,  sum ms,  sdv ms,
  mb
2403  ,  2403,2403,2403,2403, 3.3, 2.0,10.0,
16.2,23.8,27.3,1.0,  0.0,  0,   0,   0,   0,
   0
4231  ,  1806,1806,1806,1806, 4.4, 2.1,16.2,
27.0,67.2,72.5,2.0,  0.0,  0,   0,   0,   0,
   0
6796  ,  2624,2534,2534,2534, 3.1, 1.9, 9.0,
14.6,49.3,50.5,3.0,  0.10034,  0,   0,   0,   0,
   0
9449  ,  2684,2627,2627,2627, 3.0, 1.9, 8.8,
14.5,35.1,36.7,4.0,  0.08758,  0,   0,   0,   0,
   0
1 ,  2395,2395,2395,2395, 3.3, 1.8,10.0,
26.6,48.2,48.2,4.3,  0.07295,  0,   0,   0,   0,
   0


Results:
op rate   : 2345
partition rate: 2345
row rate  : 2345
latency mean  : 3.4
latency median: 1.9
latency 95th percentile   : 10.4
latency 99th percentile   : 19.7
latency 99.9th percentile : 42.6
latency max   : 72.5
total gc count: 0
total gc mb   : 0
total gc time (s) : 0
avg gc time(ms)   : NaN
stdev gc time(ms) : 0
Total operation time  : 00:00:04
END
ok
simple_bootstrap_test (bootstrap_test.TestBootstrap) ... ok

--
Ran 2 tests in 230.646s

OK
{noformat}

{noformat}
(master)mshuler@hana:~/git/cassandra-dtest$ export DISABLE_VNODES=true ; 
nosetests -vs bootstrap_test.py 
read_from_bootstrapped_node_test (bootstrap_test.TestBootstrap) ... Created 
keyspaces. Sleeping 1s for propagation.
Warming up WRITE with 5 iterations...
INFO  19:21:20 Using data-center name 'datacenter1' for DCAwareRoundRobinPolicy 
(if this is incorrect, please provide the correct datacenter name with 
DCAwareRoundRobinPolicy constructor)
Connected to cluster: test
INFO  19:21:20 New Cassandra host /127.0.0.3:9042 added
Datatacenter: datacenter1; Host: /127.0.0.1; Rack: rack1
Datatacenter: datacenter1; Host: /127.0.0.3; Rack: rack1
INFO  19:21:20 New Cassandra host /127.0.0.2:9042 added
Datatacenter: datacenter1; Host: /127.0.0.2; Rack: rack1
INFO  19:21:20 New Cassandra host /127.0.0.1:9042 added
Failed to connect over JMX; not collecting these stats
Sleeping 2s...
Running WRITE with 8 threads for 1 iteration
Failed to connect over JMX; not collecting these stats
total ops , adj row/s,op/s,pk/s,   row/s,mean, med, .95,
 .99,.999, max,   time,   stderr,  gc: #,  max ms,  sum ms,  sdv ms,
  mb
6145  ,  6143,6143,6143,6143, 1.3, 0.9, 3.1,
 6.9,18.5,34.1,1.0,  0.0,  0,   0,   0,   0,
   0
1 ,  7485,7485,7485,7485, 1.0, 0.7, 2.6,
 4.9,11.0,16.3,1.5,  0.0,  0,   0,   0,   0,
   0


Results:
op rate   : 6599
partition rate: 6599
row rate  : 6599
latency mean  : 1.2
latency median: 0.8
latency 95th percentile   : 2.9
latency 99th percentile   : 6.2
latency 99.9th percentile : 14.9
latency max   : 34.1
total gc count

[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-02-09 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312945#comment-14312945
 ] 

Aleksey Yeschenko commented on CASSANDRA-7688:
--

Disabled it for now with fd6f9c6f9c15ab28d0db0edef1f84faaa7ea42c5.

I have a simple fix for DataTracker, but want to investigate it further, so 
pushing it to 2.1.4.

Sorry for the inconvenience.

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.3

 Attachments: 7688.txt


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-02-06 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309060#comment-14309060
 ] 

Aleksey Yeschenko commented on CASSANDRA-7688:
--

You are quoting the wrong code here, but how do you *not* background it? It's 
not strictly about cost, it's about not having any other triggering mechanism.

When we add vtable support (cql tables backed by classes, not sstables) - then 
we'll switch sizing (and several other system sstables) to that. Until then, 
what other options do we have?

This is a simple temporary replacement for describe_splits_ex, its *only* goal 
is to free Spark and others from having to maintain an extra Thrift connection 
*now*. Hence the lack of metrics or configurability of the refresh interval.

I'm open to increasing/decreasing the hard-coded one, however, if you have 
better options.

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.3

 Attachments: 7688.txt


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-02-06 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309077#comment-14309077
 ] 

mck commented on CASSANDRA-7688:


{quote}You are quoting the wrong code here, but how do you not background it? 
{quote}

yes i can see that it's not possible at the moment. (i didn't realise that at 
first, but it really wasn't my train of thought either).

{quote}When we add vtable support (cql tables backed by classes, not sstables) 
- then we'll switch sizing (and several other system sstables) to that.{quote}

niceto know. thanks.

{quote}This is a simple temporary replacement for describe_splits_ex, its only 
goal is to free Spark and others from having to maintain an extra Thrift 
connection now. Hence the lack of metrics or configurability of the refresh 
interval.

I'm open to increasing/decreasing the hard-coded one, however, if you have 
better options.{quote}

i have no suggestion.
i'm more concerned/curious as to why 5 minutes?
 if there's no good answer then isn't metrics important?
 and being able to configure it.

quick examples that come to mind: 
 - what if an installation has lots of jobs built upon each others data and for 
them there's a strong benefit (if not a requirement) for more accurate sizes 
(ie faster schedule rate),
 - what if there bugs/load caused from this that can be avoided by configuring 
it to zero (disabling), giving an immediate option to upgrading-to/waiting-for 
next version.

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.3

 Attachments: 7688.txt


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-02-06 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309081#comment-14309081
 ] 

Aleksey Yeschenko commented on CASSANDRA-7688:
--

All good points. I'll add a -D option to change the interval/disable it.

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.3

 Attachments: 7688.txt


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-02-05 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308699#comment-14308699
 ] 

mck commented on CASSANDRA-7688:


{quote}Can you please elaborate on what the idea is behind storing this info in 
a system table?{quote}
I'm still curious on this question, as it wasn't about the removal of thrift 
(that's obvious) but around the reasoning for backgrounding the computation.

{code}ScheduledExecutors.optionalTasks.schedule(runnable, 5, 
TimeUnit.MINUTES);{code}
Why 5 minutes? What's the trade-off here? 
 How do we (everyone) know the computation is expensive enough to warrant 
backgrounding it?
 And that 5 minutes will give us the best throughput (across c* and its 
hadoop/spark jobs)?

a) what about putting metrics around the code in SizeEstimatesRecorder.run() so 
we can get an idea for future adjustments?
(going a step further could be do get updateSizeEstimates() to diff the old 
rows with new rows and having a metric on change frequency).

b) what about making the frequency configurable?

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.3

 Attachments: 7688.txt


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-02-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307176#comment-14307176
 ] 

Piotr Kołaczkowski commented on CASSANDRA-7688:
---

Ok, +1 then.

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.3

 Attachments: 7688.txt


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-02-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307108#comment-14307108
 ] 

Piotr Kołaczkowski commented on CASSANDRA-7688:
---

Looks good.

{code}
   // delete all previous values with a single range tombstone.
mutation.deleteRange(SIZE_ESTIMATES_CF,
 estimatesTable.comparator.make(table).start(),
 estimatesTable.comparator.make(table).end(),
 timestamp - 1);

// add a CQL row for each primary token range.
ColumnFamily cells = mutation.addOrGet(estimatesTable);
for (Map.EntryRangeToken, PairLong, Long entry : 
estimates.entrySet())
{
RangeToken range = entry.getKey();
PairLong, Long values = entry.getValue();
Composite prefix = estimatesTable.comparator.make(table, 
range.left.toString(), range.right.toString());
CFRowAdder adder = new CFRowAdder(cells, prefix, timestamp);
adder.add(partitions_count, values.left)
 .add(mean_partition_size, values.right);
}

mutation.apply();
{code}

Are updates of the table atomic? I can see you delete a whole bunch of token 
ranges with one tombstone and than add one by one. Is it possible to get an 
incomplete table when querying at the wrong moment?

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.3

 Attachments: 7688.txt


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-02-05 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307114#comment-14307114
 ] 

Aleksey Yeschenko commented on CASSANDRA-7688:
--

Since it's a single partition update, the whole thing is atomic and isolated, 
yes. I'm adding updates to the mutation one by one, but applying everything, 
including the removal of previous state, and addition of the new data, in one 
go, at mutation.apply() point.

So long as you fetch all the ranges together in one query, you'll always have a 
complete state. It might be slightly out of date and lagging behind (rare) 
topology updates for up to 5 minutes, but it'll always be internally consistent.

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.3

 Attachments: 7688.txt


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-02-03 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304338#comment-14304338
 ] 

Aleksey Yeschenko commented on CASSANDRA-7688:
--

Additionally, to reiterate what Sylvain said - we are open to improvements in 
accuracy, but those aren't trivial, and should go into another ticket.

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.3

 Attachments: 7688.txt


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-01-30 Thread Matt Byrd (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299242#comment-14299242
 ] 

Matt Byrd commented on CASSANDRA-7688:
--

So I suppose the reason for suggesting exposing the same call via cql, 
was that at least abstractly it was clear what this meant.
I concede that plumbing all this through might not be straightforward.

The problem with putting it in a system table is, what exactly do you put there?

The current computation is a somewhat expensive on demand computation that is 
generally done relatively rarely.

Was your intent to just periodically execute this function and dump the results 
into system tables?
Or did you have something different in mind?

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.3


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-01-30 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299245#comment-14299245
 ] 

Aleksey Yeschenko commented on CASSANDRA-7688:
--

bq. Was your intent to just periodically execute this function and dump the 
results into system tables?

Not this exact function, but yes, just periodically dump sizing info there.


 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.3


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-01-27 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294480#comment-14294480
 ] 

Aleksey Yeschenko commented on CASSANDRA-7688:
--

bq. I would have thought it’d be easier to just expose the Storage proxy call 
via cql?

It wouldn't, unless you propose to create an extra CQL statement just for this, 
which is something that's not gonna happen.

Otherwise you'd need support for virtual tables, and that's 3.1 territory at 
best.

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.3


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-01-27 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294484#comment-14294484
 ] 

Aleksey Yeschenko commented on CASSANDRA-7688:
--

The primary goal is to remove the dependency on Thrift, so that the spark 
connector and our hadoop code don't have to open an extra Thrift connection in 
addition to the native protocol one. See CASSANDRA-8358 for example.

Having that we'd be able to not start Thrift by default, and simpler 
implementations of spark and hadoop things.

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.3


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-01-26 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291621#comment-14291621
 ] 

mck commented on CASSANDRA-7688:


{quote}It would be an on-demand calculation that would be moderately expensive. 
{quote}
[~iamaleksey] If the implementation is but a rewrite, i'm also keen on [~mbyrd] 
question.

{quote} Can you please elaborate on what the idea is behind storing this info 
in a system table?{quote}

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.3


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-01-26 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291619#comment-14291619
 ] 

mck commented on CASSANDRA-7688:


{quote}It would be an on-demand calculation that would be moderately expensive. 
{quote}
[~iamaleksey] If the implementation is but a rewrite, i'm also keen on [~mbyrd] 
question.

{quote} Can you please elaborate on what the idea is behind storing this info 
in a system table?{quote}

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.3


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2014-12-01 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229701#comment-14229701
 ] 

Piotr Kołaczkowski commented on CASSANDRA-7688:
---

It would be nice to know also the average partition size in the given table, 
both in bytes and in number of CQL rows. This would be useful to set 
appropriate fetch.size. Additionally, current split generation API does not 
allow to set split size in terms of data size in bytes or number of CQL rows, 
but only by number of partitions. Number of partitions doesn't make a nice 
default, as partitions can vary greatly in size and are extremely use-case 
dependent. So please, don't just copy current describe_splits_ex functionality 
to the new driver, but *improve this*. 

We really don't need the driver / Cassandra to do the splitting for us. Instead 
we need to know:

1. estimate of total amount of data in the table in bytes
2. estimate of total number of CQL rows in the table
3. estimate of total number of partitions in the table

We're interested both in totals (whole cluster; logical sizes; i.e. without 
replicas), and split by token-ranges by node (physical; incuding replicas).

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
 Fix For: 2.1.3


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2014-12-01 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229818#comment-14229818
 ] 

Benedict commented on CASSANDRA-7688:
-

This is a fundamentally difficult problem, and to be answered accurately 
basically requires a full compaction. We can track or estimate this data for 
any given sstable easily, and we can estimate the number of overlapping 
partitions between two sstables (though the accuracy I'm unsure of if we 
composed this data across many sstables), but we cannot say how many rows 
within each overlapping partition overlap. The best we could do is probably 
sample some overlapping partitions to see what proportion of row overlap tends 
to prevail, and hope it is representative; if we assume a normal distribution 
of overlap ratio we could return error bounds.

I don't think it's likely this data could be maintained live, at least not 
accurately, or not without significant cost. It would be an on-demand 
calculation that would be moderately expensive. 

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
 Fix For: 2.1.3


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2014-12-01 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229828#comment-14229828
 ] 

Piotr Kołaczkowski commented on CASSANDRA-7688:
---

We only need estimates, not exact values. Factor 1.5x error is considered an 
awesome estimate, factor 3x is still fairly good. 
Also note that Spark/Hadoop does many token range scans. Maybe collecting some 
statistics on the fly, during the scans (or during the compaction) would be 
viable?  And running a full compaction to get statistics more accurate - why 
not? You need to do it anyway to get top speed when scanning data in Spark, 
because a full table scan is doing kind-of implicit compaction anyway, isn't 
it? 

Also, one more thing - it would be good to have those values per column (sorry 
for making it even harder, I know it is not an easy task). At least to know 
that a column is responsible for xx% of data in the table - knowing such thing 
would make a huge difference when estimating data size, because we're not 
always fetching all columns and they may vary in size a lot (e.g. 
collections!). Some sampling on insert would probably be enough.


 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
 Fix For: 2.1.3


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2014-12-01 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229831#comment-14229831
 ] 

Benedict commented on CASSANDRA-7688:
-

I'm talking about estimates. We cannot likely even estimate without pretty 
significant cost. Sampling column counts is pretty easy, but knowing how many 
cql rows there are for any merged row is not. There are tricks to make it 
easier, but there are datasets for which the tricks will not work, and any 
estimate would be complete guesswork without sampling the data.

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
 Fix For: 2.1.3


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2014-12-01 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230035#comment-14230035
 ] 

Sylvain Lebresne commented on CASSANDRA-7688:
-

To be clear, the target here is hadoop/spark and we're not looking at doing 
anything better than what is currently used by thrift describe_splits. Which is 
based on the sstable stats and that, yes, can be pretty bad for some datasets, 
but improving it is a goal for another ticket.

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
 Fix For: 2.1.3


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2014-12-01 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230051#comment-14230051
 ] 

Piotr Kołaczkowski commented on CASSANDRA-7688:
---

Fair enough. Just saying describe_splits is pretty bad for the reason it is not 
possible to set some reasonable default for split size. Some users were already 
pointing that out in our issue tracker. 

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
 Fix For: 2.1.3


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2014-08-08 Thread Matt Byrd (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091315#comment-14091315
 ] 

Matt Byrd commented on CASSANDRA-7688:
--

Originally I was just thinking of exposing the same method available in thrift, 
via some cql syntax i.e:
essentially from StorageProxy:
   public ListPairRangeToken, Long getSplits(String keyspaceName, String 
cfName, RangeToken range, int keysPerSplit, CFMetaData metadata)

This in turn actually operates on the index intervals in memory, getting 
appropriately sized splits given the samples taken.

Can you please elaborate on what the idea is behind storing this info in a 
system table?
It would seem that you would need to keep doing the above computation or 
something similar and write the result to a system table.
I would have thought it’d be easier to just expose the Storage proxy call via 
cql?

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
 Fix For: 2.1.1


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)