[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable

2016-02-23 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159235#comment-15159235
 ] 

Yuki Morishita commented on CASSANDRA-7464:
---

Fixed one more bug (handle case sensitive column name) and backported to 3.0 as 
well.

||branch||testall||dtest||
|[7464-3.0|https://github.com/yukim/cassandra/tree/7464-3.0]|[testall|http://cassci.datastax.com/view/Dev/view/yukim/job/yukim-7464-3.0-testall/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/yukim/job/yukim-7464-3.0-dtest/lastCompletedBuild/testReport/]|
|[7464|https://github.com/yukim/cassandra/tree/7464]|[testall|http://cassci.datastax.com/view/Dev/view/yukim/job/yukim-7464-testall/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/yukim/job/yukim-7464-dtest/lastCompletedBuild/testReport/]|

Tests are running.

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Chris Lohfink
>Priority: Minor
> Fix For: 3.0.x, 3.x
>
> Attachments: sstable-only.patch, sstabledump.patch
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable

2016-02-22 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157910#comment-15157910
 ] 

Jeremiah Jordan commented on CASSANDRA-7464:


Looks like your branch is against trunk, can we add this on 3.0?  cassandra-3.0 
branch has a regression right now that there is no sstable dump tool.

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Chris Lohfink
>Priority: Minor
> Fix For: 3.x
>
> Attachments: sstable-only.patch, sstabledump.patch
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable

2016-02-22 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157884#comment-15157884
 ] 

Yuki Morishita commented on CASSANDRA-7464:
---

Now that https://github.com/yukim/cassandra/pull/2 is merged, I added one more 
commit to error out nicely when pre 3.0 SSTable is given and added more help 
text.

||branch||testall||dtest||
|[7464|https://github.com/yukim/cassandra/tree/7464]|[testall|http://cassci.datastax.com/view/Dev/view/yukim/job/yukim-7464-testall/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/yukim/job/yukim-7464-dtest/lastCompletedBuild/testReport/]|

If tests are good, I will commit.

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Chris Lohfink
>Priority: Minor
> Fix For: 3.x
>
> Attachments: sstable-only.patch, sstabledump.patch
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable

2016-01-26 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117772#comment-15117772
 ] 

Yuki Morishita commented on CASSANDRA-7464:
---

Thanks for the patch.

Pushed your patch + suggestions to 
https://github.com/yukim/cassandra/commits/7464.

* We already have getters in SerializationHeader
* I don't want to reference DatabaseDescriptor as much as possible

Questions:

* Do we need to create KSMetaData and put it into Schema?
* Do currentScanner / position need to be Atomic*?

Some more suggestions:

* If we can put the code to open SSTable standalone to, say, SSTableReader, we 
can easily reuse it for other offline tools.
* If given key (with '-k') option does not exist in SSTable, stream will 
terminate with following error:

{code}
Exception in thread "main" java.util.NoSuchElementException
at 
org.apache.cassandra.utils.AbstractIterator.next(AbstractIterator.java:64)
at 
org.apache.cassandra.io.sstable.format.big.BigTableScanner.next(BigTableScanner.java:247)
at 
org.apache.cassandra.io.sstable.format.big.BigTableScanner.next(BigTableScanner.java:51)
at 
org.apache.cassandra.tools.SSTableExport.lambda$main$311(SSTableExport.java:228)
at 
org.apache.cassandra.tools.SSTableExport$$Lambda$18/2136288211.apply(Unknown 
Source)
at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at 
java.util.stream.SortedOps$RefSortingSink$$Lambda$20/117009527.accept(Unknown 
Source)
at java.util.ArrayList.forEach(ArrayList.java:1249)
at java.util.stream.SortedOps$RefSortingSink.end(SortedOps.java:390)
at java.util.stream.Sink$ChainedReference.end(Sink.java:258)
at java.util.stream.Sink$ChainedReference.end(Sink.java:258)
at java.util.stream.Sink$ChainedReference.end(Sink.java:258)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:513)
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:502)
at 
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at 
org.apache.cassandra.tools.JsonTransformer.toJson(JsonTransformer.java:76)
at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:255)
{code}

Shouldn't we just skip and continue?

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Chris Lohfink
>Priority: Minor
> Fix For: 3.x
>
> Attachments: sstable-only.patch, sstabledump.patch
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable

2016-01-26 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117863#comment-15117863
 ] 

Chris Lohfink commented on CASSANDRA-7464:
--

> Do we need to create KSMetaData and put it into Schema?

I did at the time to prevent a NPE, but that was because I was actually using 
CQL to create the cfmetadata and it looked for the reference in Schema. That 
isnt necessary now though so its good to be removed.

> Do currentScanner / position need to be Atomic?

Absolutely not. Just used it as a wrapper for mutability within lambdas. I 
wanted just a single ISSTableScanner so it could just have the 1 created for 
both list of keys and whole sstable scans. Then the whole thing would not of 
been necessary. But creating the collection of Ranges for the DataRange 
instead of Bounds turned into a mess (there a way of turning Bounds to Range? 
overriding all the Token impls to have a inc/dec etc was intimidating).

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Chris Lohfink
>Priority: Minor
> Fix For: 3.x
>
> Attachments: sstable-only.patch, sstabledump.patch
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable

2016-01-25 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116784#comment-15116784
 ] 

Chris Lohfink commented on CASSANDRA-7464:
--

The debugging format that may be nice for both the "one per line" and the we 
can do that pretty easily using the UnfilteredRow.toString so instead of 
{code}
[
  {
"partition" : {
  "key" : [ "127.0.0.1-getWriteLatencyHisto" ],
  "position" : 19385620
},
"rows" : [
  {
"type" : "row",
"position" : 19385664,
"clustering" : [ "694621867" ],
"cells" : [
  { "name" : "value", "value" : "00", "tstamp" : 1452861829846001, 
"ttl" : 604800, "expires_at" : 1453466629, "expired" : true }
]
  },
  {
"type" : "row",
"position" : 19385686,
"clustering" : [ "694621927" ],
"cells" : [
  { "name" : "value", "value" : "00", "tstamp" : 1452861769124000, 
"ttl" : 604800, "expires_at" : 1453466569, "expired" : true }
]
  },
  {
"type" : "row",
"position" : 19385708,
"clustering" : [ "694621987" ],
"cells" : [
  { "name" : "value", "value" : "00", "tstamp" : 1452861709303002, 
"ttl" : 604800, "expires_at" : 1453466509, "expired" : true }
]
  },
  {
"type" : "row",
"position" : 19385730,
"clustering" : [ "694622047" ],
"cells" : [
  { "name" : "value", "value" : "00", "tstamp" : 1452861649548002, 
"ttl" : 604800, "expires_at" : 1453466449, "expired" : true }
]
  },
...
{code}
it can be
{code}
[127.0.0.1-getWriteLatencyHisto]@19385620 Row[info=[ts=-9223372036854775808] ]: 
694621867 | [value=00 ts=1452861829846001 ttl=604800 ldt=1453466629]
[127.0.0.1-getWriteLatencyHisto]@19385686 Row[info=[ts=-9223372036854775808] ]: 
694621927 | [value=00 ts=1452861769124000 ttl=604800 ldt=1453466569]
[127.0.0.1-getWriteLatencyHisto]@19385708 Row[info=[ts=-9223372036854775808] ]: 
694621987 | [value=00 ts=1452861709303002 ttl=604800 ldt=1453466509]
[127.0.0.1-getWriteLatencyHisto]@19385730 Row[info=[ts=-9223372036854775808] ]: 
694622047 | [value=00 ts=1452861649548002 ttl=604800 ldt=1453466449]
...
{code}
This would also have benefit for easily splitting files for hadoop jobs etc 
since it would have a cql row per line (easing wide partition issues with the 
compact output mentioned above). It would also tie the rendering to something 
already maintained for debug logging etc so little additional work for 
refactoring/storage changes. I am kinda a fan of both. So I implemented a 
{{-d}} (could use better name) option for the 1 row per line "debuggy" compact 
option (worth noting this is very hard to read if theres a lot of cells).

Also added the current position from the scanner in the results (see above 
examples).

Until CASSANDRA-9587 I had to add an alternative not to print out clustering 
key names in the toString since its not available anywhere which is a little 
hacky but can be removed once we have the names.

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Chris Lohfink
>Priority: Minor
> Fix For: 3.x
>
> Attachments: sstable-only.patch, sstabledump.patch
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable

2016-01-11 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091990#comment-15091990
 ] 

Jonathan Ellis commented on CASSANDRA-7464:
---

bq. my personal preference would be to stop calling that sstable2json, but 
rather have a tool whose purpose is to inspect sstables, since that's what 
people use it for in the first place anyway. That tool could then ultimately 
output multiple formats, json being only one of them and we could have 
something more readable otherwise (and could very well have some more useful 
informations like file offsets and such which could help a lot when debugging

I don't see a point in creating an ad-hoc format along with json.  There's no 
reason we couldn't include file offsets in json.

bq. I'd also personally prefer not re-adding json2sstable. I think that tool is 
a lot less justified since you can easily create sstable (from whatever you 
want) using CQLSSTableWriter.

bq. How about sstabledump? 

+1

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Chris Lohfink
>Priority: Minor
> Fix For: 3.x
>
> Attachments: sstable-only.patch
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable

2016-01-11 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092023#comment-15092023
 ] 

Sylvain Lebresne commented on CASSANDRA-7464:
-

bq. I don't see a point in creating an ad-hoc format along with json

If I'm just "debugging" a sstable (because, say, a user as a bug that I'm 
investigating), i.e. I'm interested in checking the output manually, not 
processing it, then I personally very much prefer something like:
{noformat}
["e", "f", "g", "h"](ts=1451330118497426) : [value=1](ts=1451330118497426), 
[other_col='foo']
["a", "b", "c", "d"](ts=1451330118479576) : [value=3], [other_col='bar'](ttl=3)
{noformat}
to
{noformat}
[
  {
"partition" : {
  "key" : [ "e", "f" ]
},
"rows" : [
  {
"type" : "row",
"clustering" : [ "g", "h" ],
"liveness_info" : { "tstamp" : 1451330118497426 },
"cells" : [
  { "name" : "value", "value" : "2" }
]
  }
]
  },
  {
"partition" : {
  "key" : [ "a", "b" ]
},
"rows" : [
  {
"type" : "row",
"clustering" : [ "c", "d" ],
"liveness_info" : { "tstamp" : 1451330118479576 },
"cells" : [
  { "name" : "value", "value" : "1" }
]
  }
]
  }
]
{noformat}
And sure, that's probably only useful to devs and a few advanced and interested 
users, but I think having easy to use tools for that population is also 
important (keeping in mind that I don't pretend either that it's priority 
number one).


> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Chris Lohfink
>Priority: Minor
> Fix For: 3.x
>
> Attachments: sstable-only.patch
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable

2016-01-07 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15087603#comment-15087603
 ] 

Jonathan Ellis commented on CASSANDRA-7464:
---

Chris, when would the "type" of a row not be "row?"  Is that how you'd support 
static columns?  Maybe that would be better as its own sub-object rather than a 
different type of row.

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Chris Lohfink
>Priority: Minor
> Fix For: 3.x
>
> Attachments: sstable-only.patch
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable

2016-01-07 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15087789#comment-15087789
 ] 

Chris Lohfink commented on CASSANDRA-7464:
--

I am good with no json2sstable, its a little non trivial to write. I could 
change to something like:

./bin/sstableexport json 

and add support for few other formats.

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Chris Lohfink
>Priority: Minor
> Fix For: 3.x
>
> Attachments: sstable-only.patch
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable

2016-01-07 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15087785#comment-15087785
 ] 

Chris Lohfink commented on CASSANDRA-7464:
--

A row can be a static row, also a range tombstone.

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Chris Lohfink
>Priority: Minor
> Fix For: 3.x
>
> Attachments: sstable-only.patch
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable

2016-01-07 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15087796#comment-15087796
 ] 

Robert Stupp commented on CASSANDRA-7464:
-

How about {{sstabledump}}? Export implies to me that there's also something 
that can do the import. (bike shedding, i know)

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Chris Lohfink
>Priority: Minor
> Fix For: 3.x
>
> Attachments: sstable-only.patch
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable

2015-12-29 Thread Russell Bradberry (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074017#comment-15074017
 ] 

Russell Bradberry commented on CASSANDRA-7464:
--

Personally I would like to see an option to have an output method that is more 
digestible by scripts.  The old sstable2json and currently this one, output the 
entire SSTable as a single array that is pretty-formatted.  This is great for 
visually looking at it but requires the loading of an entire SSTable into 
memory before JSON parsing it.  There are tools that attempt to read a large 
JSON stream and emit objects as they are complete, but these are rather 
cumbersome and difficult to use, also tend to be different form language to 
language.

What I would propose is to have a command line option that will output one 
partition per line (escaping any newlines encountered) without any leading 
trailing brackets or commas.  This will allow for an application to be able to 
read one partition at a time and work on it in a streaming fashion.

I also put my thoughts on this in this github issue: 
https://github.com/tolbertam/sstable-tools/issues/19

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Chris Lohfink
>Priority: Minor
> Fix For: 3.x
>
> Attachments: sstable-only.patch
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable

2015-12-21 Thread Andy Tolbert (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066841#comment-15066841
 ] 

Andy Tolbert commented on CASSANDRA-7464:
-

[~JoshuaMcKenzie], we'd definitely both be interested and willing :).   I don't 
think it would be too big of an effort to get it working with C*.  The only 
non-cli/logging dependency is jackson, which C* already depends on (albeit an 
older version) so it shouldn't be too much effort.

We took a best effort at coming up with an output format that we thought would 
be human readable and familiar to those who previously used sstable2json, but 
definitely would be welcome to feedback.

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Priority: Minor
> Fix For: 3.x
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable

2015-12-21 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066699#comment-15066699
 ] 

Joshua McKenzie commented on CASSANDRA-7464:


[~cnlwsu] / [~andrew.tolbert]: How much work would it be to get a compatible 
version of your sstable2json with the official C* repo, assuming you're 
interested/willing?

Plenty of us in the community would be more than happy to review / provide 
feedback on integration.

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Priority: Minor
> Fix For: 3.x
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable

2015-12-17 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063308#comment-15063308
 ] 

Chris Lohfink commented on CASSANDRA-7464:
--

In meantime for people wanting a sstable2json tool, [~andrew.tolbert] and I 
have a version here: https://github.com/tolbertam/sstable-tools that supports 
the 3.x versions currently. 

bq. A key differentiator between the storage format between older verisons of 
Cassandra and Cassandra 3.0 is that an SSTable was previously a representation 
of partitions and their cells (identified by their clustering and column name) 
whereas with Cassandra 3.0 an SSTable now represents partitions and their rows. 
You can read about these changes in more detail by visiting this blog post. 
Additional improvements over the sstable2json tool includes no longer requiring 
the cassandra.yaml in classpath with the schema of the sstables loaded. Also by 
running in client mode this tool will not write to system tables or your commit 
log. It can safely be run as any user anywhere with no side effects.

Its a little easier to run then older version as well. We are using this place 
as a playground but it may be a good starter if updating tool in C* as well.

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Priority: Minor
> Fix For: 3.x
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable

2015-12-03 Thread Russell Bradberry (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038339#comment-15038339
 ] 

Russell Bradberry commented on CASSANDRA-7464:
--

It is absolutely insane that a perfectly working, albeit not the greatest, 
troubleshooting tool was removed and not replaced with anything. We now have no 
way at all to look into the SSTables. This makes troubleshooting production 
problems incredibly difficult. I am curious as to why enough consideration 
wasn't given to hold off the removal of the tool until the new one was ready to 
go.

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Priority: Minor
> Fix For: 3.x
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable

2015-12-03 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038529#comment-15038529
 ] 

Sylvain Lebresne commented on CASSANDRA-7464:
-

bq. It is absolutely insane that a perfectly working, albeit not the greatest, 
troubleshooting tool was removed and not replaced with anything.

I think there is a bit of confusion on what happened here and that's possibly 
due to the phrasing of the description so I apologize for that. Those tools 
were more or less exposing the sstable layout. 3.0 has completely and 
profoundly changed that layout so those haven't been removed, their code has 
been almost entirely invalidated by the changes made in 3.0 and we have to 
re-implement them pretty much from scratch (we can salvage a couple lines of 
code to deal with json but that's not really the problem). Which imply figuring 
a decent way to expose the new layout (in json or otherwise) and I'm suggesting 
we might as well try to do a better job than what we did for the previous 
layout.

The only choice we did made is to not delay the 3.0 release until we had time 
to deal with that rewrite because we figured some users may be fine starting to 
use 3.0 without this (at least in development/for testing since let's be frank 
here, few people will go in production with 3.x before probably 3.3/3.4 if not 
later). I'm personally comfortable this wasn't an unreasonable choice.

But please be assured we haven't forgotten about this. It's just that we are 
pretty early after the 3.0 release and we're somewhat prioritizing fixing our 
known bugs (and fixing our damn dtests) before re-adding this. Which, here 
again, I don't think is completely unreasonable. Hopefully, we'll soon have 
fixed our most pressing bugs and will be able to devote resources to this. But 
if you can't wait, this is open source ... :)

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Priority: Minor
> Fix For: 3.x
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable

2015-11-19 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013861#comment-15013861
 ] 

Jeremy Hanna commented on CASSANDRA-7464:
-

I'm +1 on adding something to debug sstables as well.  People also used the two 
tools for data migrations and stripping out unwanted rows from sstables.  For 
the latter, it was for removing large partitions that couldn't be compacted.

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Priority: Minor
> Fix For: 3.x
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable

2015-11-19 Thread Sebastian Estevez (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013959#comment-15013959
 ] 

Sebastian Estevez commented on CASSANDRA-7464:
--

For general understanding of how SSTables and compaction work, sstable2json is 
invaluable. 3.0 dropped last week this issue has not been prioritized.

Here's a couple of great posts that only exist because the community had the 
tooling to introspect sstables:
http://www.planetcassandra.org/blog/qa-starters-guide-to-cassandra/
http://thelastpickle.com/blog/2011/05/15/Deletes-and-Tombstones.html

folks also used cassandra-cli for this which I think we also deprecated:
http://opensourceconnections.com/blog/2013/07/24/understanding-how-cql3-maps-to-cassandras-internal-data-structure/

There's lots more where these came from, just wanted to show some good examples 
of why the tools are useful. +1 on this Jira.

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Priority: Minor
> Fix For: 3.x
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable

2015-11-19 Thread Wei Deng (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013968#comment-15013968
 ] 

Wei Deng commented on CASSANDRA-7464:
-

If we didn't have sstablej2son tool, it would have been really really difficult 
to troubleshoot CASSANDRA-7953 in production.

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Priority: Minor
> Fix For: 3.x
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable

2015-11-18 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011436#comment-15011436
 ] 

Jeremiah Jordan commented on CASSANDRA-7464:


So we removed these with CASSANDRA-9618 and never replaced them.

> Replace sstable2json and json2sstable
> -
>
> Key: CASSANDRA-7464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Priority: Minor
> Fix For: 3.x
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is 
> much more efficient and convenient ways to do import/export data), but their 
> output manage to be hard to handle both for humans and for tools (especially 
> as soon as you have modern stuff like composites).
> There is value to having tools to export sstable contents into a format that 
> is easy to manipulate by human and tools for debugging, small hacks and 
> general tinkering, but sstable2json and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better 
> replacements. It shouldn't be too hard to come up with an output format that 
> is more aware of modern concepts like composites, UDTs, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)