[jira] [Comment Edited] (CASSANDRA-7464) Replace sstable2json and json2sstable

2016-01-26 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117863#comment-15117863
 ] 

Chris Lohfink edited comment on CASSANDRA-7464 at 1/27/16 5:34 AM:
---

> Do we need to create KSMetaData and put it into Schema?

I did at the time to prevent a NPE, but that was because I was actually using 
CQL to create the cfmetadata and it looked for the reference in Schema. That 
isnt necessary now though so its good to be removed.

> Do currentScanner / position need to be Atomic?

Absolutely not. Just used it as a wrapper for mutability within lambdas.

PR with some changes: https://github.com/yukim/cassandra/pull/2


was (Author: cnlwsu):
> Do we need to create KSMetaData and put it into Schema?

I did at the time to prevent a NPE, but that was because I was actually using 
CQL to create the cfmetadata and it looked for the reference in Schema. That 
isnt necessary now though so its good to be removed.

> Do currentScanner / position need to be Atomic?

Absolutely not. Just used it as a wrapper for mutability within lambdas. New 
patch incomming that cleans this up

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Chris Lohfink
>Priority: Minor
> Fix For: 3.x
>
> Attachments: sstable-only.patch, sstabledump.patch
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7464) Replace sstable2json and json2sstable

2016-01-26 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117863#comment-15117863
 ] 

Chris Lohfink edited comment on CASSANDRA-7464 at 1/26/16 8:17 PM:
---

> Do we need to create KSMetaData and put it into Schema?

I did at the time to prevent a NPE, but that was because I was actually using 
CQL to create the cfmetadata and it looked for the reference in Schema. That 
isnt necessary now though so its good to be removed.

> Do currentScanner / position need to be Atomic?

Absolutely not. Just used it as a wrapper for mutability within lambdas. New 
patch incomming that cleans this up


was (Author: cnlwsu):
> Do we need to create KSMetaData and put it into Schema?

I did at the time to prevent a NPE, but that was because I was actually using 
CQL to create the cfmetadata and it looked for the reference in Schema. That 
isnt necessary now though so its good to be removed.

> Do currentScanner / position need to be Atomic?

Absolutely not. Just used it as a wrapper for mutability within lambdas. I 
wanted just a single ISSTableScanner so it could just have the 1 created for 
both list of keys and whole sstable scans. Then the whole thing would not of 
been necessary. But creating the collection of Ranges for the DataRange 
instead of Bounds turned into a mess (there a way of turning Bounds to Range? 
overriding all the Token impls to have a inc/dec etc was intimidating).

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Chris Lohfink
>Priority: Minor
> Fix For: 3.x
>
> Attachments: sstable-only.patch, sstabledump.patch
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7464) Replace sstable2json and json2sstable

2016-01-25 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116784#comment-15116784
 ] 

Chris Lohfink edited comment on CASSANDRA-7464 at 1/26/16 6:44 AM:
---

The debugging format that may be nice for both the "one per line" and the we 
can do that pretty easily using the UnfilteredRow.toString so instead of 
{code}
[
  {
"partition" : {
  "key" : [ "127.0.0.1-getWriteLatencyHisto" ],
  "position" : 19385620
},
"rows" : [
  {
"type" : "row",
"position" : 19385664,
"clustering" : [ "694621867" ],
"cells" : [
  { "name" : "value", "value" : "00", "tstamp" : 1452861829846001, 
"ttl" : 604800, "expires_at" : 1453466629, "expired" : true }
]
  },
  {
"type" : "row",
"position" : 19385686,
"clustering" : [ "694621927" ],
"cells" : [
  { "name" : "value", "value" : "00", "tstamp" : 1452861769124000, 
"ttl" : 604800, "expires_at" : 1453466569, "expired" : true }
]
  },
  {
"type" : "row",
"position" : 19385708,
"clustering" : [ "694621987" ],
"cells" : [
  { "name" : "value", "value" : "00", "tstamp" : 1452861709303002, 
"ttl" : 604800, "expires_at" : 1453466509, "expired" : true }
]
  },
  {
"type" : "row",
"position" : 19385730,
"clustering" : [ "694622047" ],
"cells" : [
  { "name" : "value", "value" : "00", "tstamp" : 1452861649548002, 
"ttl" : 604800, "expires_at" : 1453466449, "expired" : true }
]
  },
...
{code}
it can be
{code}
[127.0.0.1-getWriteLatencyHisto]@19385620 Row[info=[ts=-9223372036854775808] ]: 
694621867 | [value=00 ts=1452861829846001 ttl=604800 ldt=1453466629]
[127.0.0.1-getWriteLatencyHisto]@19385686 Row[info=[ts=-9223372036854775808] ]: 
694621927 | [value=00 ts=1452861769124000 ttl=604800 ldt=1453466569]
[127.0.0.1-getWriteLatencyHisto]@19385708 Row[info=[ts=-9223372036854775808] ]: 
694621987 | [value=00 ts=1452861709303002 ttl=604800 ldt=1453466509]
[127.0.0.1-getWriteLatencyHisto]@19385730 Row[info=[ts=-9223372036854775808] ]: 
694622047 | [value=00 ts=1452861649548002 ttl=604800 ldt=1453466449]
...
{code}
This would also have benefit for easily splitting files for hadoop jobs etc 
since it would have a cql row per line (easing wide partition issues with the 
compact output from Russell discussion in other ticket). It would also tie the 
rendering to something already maintained for debug logging etc so little 
additional work for refactoring/storage changes. I am kinda a fan of both. So I 
implemented a {{-d}} (could use better name) option for the 1 row per line 
"debuggy" compact option (worth noting this is very hard to read if theres a 
lot of cells).

Also added the current position from the scanner in the results (see above 
examples).

Until CASSANDRA-9587 I had to add an alternative not to print out clustering 
key names in the toString since its not available anywhere which is a little 
hacky but can be removed once we have the names.


was (Author: cnlwsu):
The debugging format that may be nice for both the "one per line" and the we 
can do that pretty easily using the UnfilteredRow.toString so instead of 
{code}
[
  {
"partition" : {
  "key" : [ "127.0.0.1-getWriteLatencyHisto" ],
  "position" : 19385620
},
"rows" : [
  {
"type" : "row",
"position" : 19385664,
"clustering" : [ "694621867" ],
"cells" : [
  { "name" : "value", "value" : "00", "tstamp" : 1452861829846001, 
"ttl" : 604800, "expires_at" : 1453466629, "expired" : true }
]
  },
  {
"type" : "row",
"position" : 19385686,
"clustering" : [ "694621927" ],
"cells" : [
  { "name" : "value", "value" : "00", "tstamp" : 1452861769124000, 
"ttl" : 604800, "expires_at" : 1453466569, "expired" : true }
]
  },
  {
"type" : "row",
"position" : 19385708,
"clustering" : [ "694621987" ],
"cells" : [
  { "name" : "value", "value" : "00", "tstamp" : 1452861709303002, 
"ttl" : 604800, "expires_at" : 1453466509, "expired" : true }
]
  },
  {
"type" : "row",
"position" : 19385730,
"clustering" : [ "694622047" ],
"cells" : [
  { "name" : "value", "value" : "00", "tstamp" : 1452861649548002, 
"ttl" : 604800, "expires_at" : 1453466449, "expired" : true }
]
  },
...
{code}
it can be
{code}
[127.0.0.1-getWriteLatencyHisto]@19385620 Row[info=[ts=-9223372036854775808] ]: 
694621867 | [value=00 ts=1452861829846001 ttl=604800 ldt=1453466629]
[127.0.0.1-getWriteLatencyHisto]@19385686 Row[info=[ts=-9223372036854775808] ]: 
694621927 | [value=00 ts=1452861769124000 ttl=604800 ldt=1453466569]

[jira] [Comment Edited] (CASSANDRA-7464) Replace sstable2json and json2sstable

2015-12-29 Thread Russell Bradberry (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074017#comment-15074017
 ] 

Russell Bradberry edited comment on CASSANDRA-7464 at 12/29/15 4:27 PM:


I would like to see an option to have an output method that is more digestible 
by scripts.  The old sstable2json and currently this one, output the entire 
SSTable as a single array that is pretty-formatted.  This is great for visually 
looking at it but requires the loading of an entire SSTable into memory before 
JSON parsing it.  There are tools that attempt to read a large JSON stream and 
emit objects as they are complete, but these are rather cumbersome and 
difficult to use, also tend to be different from language to language.

What I would propose is to have a command line option that will output one 
partition per line (escaping any newlines encountered) without any leading 
trailing brackets or commas.  This will allow for an application to be able to 
read one partition at a time and work on it in a streaming fashion.

I also put my thoughts on this in this github issue: 
https://github.com/tolbertam/sstable-tools/issues/19


was (Author: devdazed):
I would like to see an option to have an output method that is more digestible 
by scripts.  The old sstable2json and currently this one, output the entire 
SSTable as a single array that is pretty-formatted.  This is great for visually 
looking at it but requires the loading of an entire SSTable into memory before 
JSON parsing it.  There are tools that attempt to read a large JSON stream and 
emit objects as they are complete, but these are rather cumbersome and 
difficult to use, also tend to be different fromm language to language.

What I would propose is to have a command line option that will output one 
partition per line (escaping any newlines encountered) without any leading 
trailing brackets or commas.  This will allow for an application to be able to 
read one partition at a time and work on it in a streaming fashion.

I also put my thoughts on this in this github issue: 
https://github.com/tolbertam/sstable-tools/issues/19

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Chris Lohfink
>Priority: Minor
> Fix For: 3.x
>
> Attachments: sstable-only.patch
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7464) Replace sstable2json and json2sstable

2015-12-29 Thread Russell Bradberry (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074017#comment-15074017
 ] 

Russell Bradberry edited comment on CASSANDRA-7464 at 12/29/15 3:54 PM:


I would like to see an option to have an output method that is more digestible 
by scripts.  The old sstable2json and currently this one, output the entire 
SSTable as a single array that is pretty-formatted.  This is great for visually 
looking at it but requires the loading of an entire SSTable into memory before 
JSON parsing it.  There are tools that attempt to read a large JSON stream and 
emit objects as they are complete, but these are rather cumbersome and 
difficult to use, also tend to be different fromm language to language.

What I would propose is to have a command line option that will output one 
partition per line (escaping any newlines encountered) without any leading 
trailing brackets or commas.  This will allow for an application to be able to 
read one partition at a time and work on it in a streaming fashion.

I also put my thoughts on this in this github issue: 
https://github.com/tolbertam/sstable-tools/issues/19


was (Author: devdazed):
Personally I would like to see an option to have an output method that is more 
digestible by scripts.  The old sstable2json and currently this one, output the 
entire SSTable as a single array that is pretty-formatted.  This is great for 
visually looking at it but requires the loading of an entire SSTable into 
memory before JSON parsing it.  There are tools that attempt to read a large 
JSON stream and emit objects as they are complete, but these are rather 
cumbersome and difficult to use, also tend to be different form language to 
language.

What I would propose is to have a command line option that will output one 
partition per line (escaping any newlines encountered) without any leading 
trailing brackets or commas.  This will allow for an application to be able to 
read one partition at a time and work on it in a streaming fashion.

I also put my thoughts on this in this github issue: 
https://github.com/tolbertam/sstable-tools/issues/19

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Chris Lohfink
>Priority: Minor
> Fix For: 3.x
>
> Attachments: sstable-only.patch
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7464) Replace sstable2json and json2sstable

2015-11-19 Thread Sebastian Estevez (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013959#comment-15013959
 ] 

Sebastian Estevez edited comment on CASSANDRA-7464 at 11/19/15 5:25 PM:


For general understanding of how SSTables and compaction work, sstable2json is 
invaluable. 3.0 dropped last week this issue has not been prioritized.

Here's a couple of great posts that only exist because the community had the 
tooling to introspect sstables:
http://www.planetcassandra.org/blog/qa-starters-guide-to-cassandra/
http://thelastpickle.com/blog/2011/05/15/Deletes-and-Tombstones.html
http://www.jsravn.com/2015/05/13/cassandra-tombstones-collections.html

folks also used cassandra-cli for this which I think we also deprecated (7920):
http://opensourceconnections.com/blog/2013/07/24/understanding-how-cql3-maps-to-cassandras-internal-data-structure/

There's lots more where these came from, just wanted to show some good examples 
of why the tools are useful. +1 on this Jira.


was (Author: sebastian.este...@datastax.com):
For general understanding of how SSTables and compaction work, sstable2json is 
invaluable. 3.0 dropped last week this issue has not been prioritized.

Here's a couple of great posts that only exist because the community had the 
tooling to introspect sstables:
http://www.planetcassandra.org/blog/qa-starters-guide-to-cassandra/
http://thelastpickle.com/blog/2011/05/15/Deletes-and-Tombstones.html

folks also used cassandra-cli for this which I think we also deprecated:
http://opensourceconnections.com/blog/2013/07/24/understanding-how-cql3-maps-to-cassandras-internal-data-structure/

There's lots more where these came from, just wanted to show some good examples 
of why the tools are useful. +1 on this Jira.

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Priority: Minor
> Fix For: 3.x
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)