[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable
[ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159235#comment-15159235 ] Yuki Morishita commented on CASSANDRA-7464: --- Fixed one more bug (handle case sensitive column name) and backported to 3.0 as well. ||branch||testall||dtest|| |[7464-3.0|https://github.com/yukim/cassandra/tree/7464-3.0]|[testall|http://cassci.datastax.com/view/Dev/view/yukim/job/yukim-7464-3.0-testall/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/yukim/job/yukim-7464-3.0-dtest/lastCompletedBuild/testReport/]| |[7464|https://github.com/yukim/cassandra/tree/7464]|[testall|http://cassci.datastax.com/view/Dev/view/yukim/job/yukim-7464-testall/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/yukim/job/yukim-7464-dtest/lastCompletedBuild/testReport/]| Tests are running. > Replace sstable2json and json2sstable > - > > Key: CASSANDRA-7464 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7464 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Chris Lohfink >Priority: Minor > Fix For: 3.0.x, 3.x > > Attachments: sstable-only.patch, sstabledump.patch > > > Both tools are pretty awful. They are primarily meant for debugging (there is > much more efficient and convenient ways to do import/export data), but their > output manage to be hard to handle both for humans and for tools (especially > as soon as you have modern stuff like composites). > There is value to having tools to export sstable contents into a format that > is easy to manipulate by human and tools for debugging, small hacks and > general tinkering, but sstable2json and json2sstable are not that. > So I propose that we deprecate those tools and consider writing better > replacements. It shouldn't be too hard to come up with an output format that > is more aware of modern concepts like composites, UDTs, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable
[ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157910#comment-15157910 ] Jeremiah Jordan commented on CASSANDRA-7464: Looks like your branch is against trunk, can we add this on 3.0? cassandra-3.0 branch has a regression right now that there is no sstable dump tool. > Replace sstable2json and json2sstable > - > > Key: CASSANDRA-7464 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7464 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Chris Lohfink >Priority: Minor > Fix For: 3.x > > Attachments: sstable-only.patch, sstabledump.patch > > > Both tools are pretty awful. They are primarily meant for debugging (there is > much more efficient and convenient ways to do import/export data), but their > output manage to be hard to handle both for humans and for tools (especially > as soon as you have modern stuff like composites). > There is value to having tools to export sstable contents into a format that > is easy to manipulate by human and tools for debugging, small hacks and > general tinkering, but sstable2json and json2sstable are not that. > So I propose that we deprecate those tools and consider writing better > replacements. It shouldn't be too hard to come up with an output format that > is more aware of modern concepts like composites, UDTs, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable
[ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157884#comment-15157884 ] Yuki Morishita commented on CASSANDRA-7464: --- Now that https://github.com/yukim/cassandra/pull/2 is merged, I added one more commit to error out nicely when pre 3.0 SSTable is given and added more help text. ||branch||testall||dtest|| |[7464|https://github.com/yukim/cassandra/tree/7464]|[testall|http://cassci.datastax.com/view/Dev/view/yukim/job/yukim-7464-testall/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/yukim/job/yukim-7464-dtest/lastCompletedBuild/testReport/]| If tests are good, I will commit. > Replace sstable2json and json2sstable > - > > Key: CASSANDRA-7464 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7464 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Chris Lohfink >Priority: Minor > Fix For: 3.x > > Attachments: sstable-only.patch, sstabledump.patch > > > Both tools are pretty awful. They are primarily meant for debugging (there is > much more efficient and convenient ways to do import/export data), but their > output manage to be hard to handle both for humans and for tools (especially > as soon as you have modern stuff like composites). > There is value to having tools to export sstable contents into a format that > is easy to manipulate by human and tools for debugging, small hacks and > general tinkering, but sstable2json and json2sstable are not that. > So I propose that we deprecate those tools and consider writing better > replacements. It shouldn't be too hard to come up with an output format that > is more aware of modern concepts like composites, UDTs, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable
[ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117772#comment-15117772 ] Yuki Morishita commented on CASSANDRA-7464: --- Thanks for the patch. Pushed your patch + suggestions to https://github.com/yukim/cassandra/commits/7464. * We already have getters in SerializationHeader * I don't want to reference DatabaseDescriptor as much as possible Questions: * Do we need to create KSMetaData and put it into Schema? * Do currentScanner / position need to be Atomic*? Some more suggestions: * If we can put the code to open SSTable standalone to, say, SSTableReader, we can easily reuse it for other offline tools. * If given key (with '-k') option does not exist in SSTable, stream will terminate with following error: {code} Exception in thread "main" java.util.NoSuchElementException at org.apache.cassandra.utils.AbstractIterator.next(AbstractIterator.java:64) at org.apache.cassandra.io.sstable.format.big.BigTableScanner.next(BigTableScanner.java:247) at org.apache.cassandra.io.sstable.format.big.BigTableScanner.next(BigTableScanner.java:51) at org.apache.cassandra.tools.SSTableExport.lambda$main$311(SSTableExport.java:228) at org.apache.cassandra.tools.SSTableExport$$Lambda$18/2136288211.apply(Unknown Source) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.stream.SortedOps$RefSortingSink$$Lambda$20/117009527.accept(Unknown Source) at java.util.ArrayList.forEach(ArrayList.java:1249) at java.util.stream.SortedOps$RefSortingSink.end(SortedOps.java:390) at java.util.stream.Sink$ChainedReference.end(Sink.java:258) at java.util.stream.Sink$ChainedReference.end(Sink.java:258) at java.util.stream.Sink$ChainedReference.end(Sink.java:258) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:513) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:502) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at org.apache.cassandra.tools.JsonTransformer.toJson(JsonTransformer.java:76) at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:255) {code} Shouldn't we just skip and continue? > Replace sstable2json and json2sstable > - > > Key: CASSANDRA-7464 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7464 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Chris Lohfink >Priority: Minor > Fix For: 3.x > > Attachments: sstable-only.patch, sstabledump.patch > > > Both tools are pretty awful. They are primarily meant for debugging (there is > much more efficient and convenient ways to do import/export data), but their > output manage to be hard to handle both for humans and for tools (especially > as soon as you have modern stuff like composites). > There is value to having tools to export sstable contents into a format that > is easy to manipulate by human and tools for debugging, small hacks and > general tinkering, but sstable2json and json2sstable are not that. > So I propose that we deprecate those tools and consider writing better > replacements. It shouldn't be too hard to come up with an output format that > is more aware of modern concepts like composites, UDTs, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable
[ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117863#comment-15117863 ] Chris Lohfink commented on CASSANDRA-7464: -- > Do we need to create KSMetaData and put it into Schema? I did at the time to prevent a NPE, but that was because I was actually using CQL to create the cfmetadata and it looked for the reference in Schema. That isnt necessary now though so its good to be removed. > Do currentScanner / position need to be Atomic? Absolutely not. Just used it as a wrapper for mutability within lambdas. I wanted just a single ISSTableScanner so it could just have the 1 created for both list of keys and whole sstable scans. Then the whole thing would not of been necessary. But creating the collection of Ranges for the DataRange instead of Bounds turned into a mess (there a way of turning Bounds to Range? overriding all the Token impls to have a inc/dec etc was intimidating). > Replace sstable2json and json2sstable > - > > Key: CASSANDRA-7464 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7464 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Chris Lohfink >Priority: Minor > Fix For: 3.x > > Attachments: sstable-only.patch, sstabledump.patch > > > Both tools are pretty awful. They are primarily meant for debugging (there is > much more efficient and convenient ways to do import/export data), but their > output manage to be hard to handle both for humans and for tools (especially > as soon as you have modern stuff like composites). > There is value to having tools to export sstable contents into a format that > is easy to manipulate by human and tools for debugging, small hacks and > general tinkering, but sstable2json and json2sstable are not that. > So I propose that we deprecate those tools and consider writing better > replacements. It shouldn't be too hard to come up with an output format that > is more aware of modern concepts like composites, UDTs, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable
[ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116784#comment-15116784 ] Chris Lohfink commented on CASSANDRA-7464: -- The debugging format that may be nice for both the "one per line" and the we can do that pretty easily using the UnfilteredRow.toString so instead of {code} [ { "partition" : { "key" : [ "127.0.0.1-getWriteLatencyHisto" ], "position" : 19385620 }, "rows" : [ { "type" : "row", "position" : 19385664, "clustering" : [ "694621867" ], "cells" : [ { "name" : "value", "value" : "00", "tstamp" : 1452861829846001, "ttl" : 604800, "expires_at" : 1453466629, "expired" : true } ] }, { "type" : "row", "position" : 19385686, "clustering" : [ "694621927" ], "cells" : [ { "name" : "value", "value" : "00", "tstamp" : 1452861769124000, "ttl" : 604800, "expires_at" : 1453466569, "expired" : true } ] }, { "type" : "row", "position" : 19385708, "clustering" : [ "694621987" ], "cells" : [ { "name" : "value", "value" : "00", "tstamp" : 1452861709303002, "ttl" : 604800, "expires_at" : 1453466509, "expired" : true } ] }, { "type" : "row", "position" : 19385730, "clustering" : [ "694622047" ], "cells" : [ { "name" : "value", "value" : "00", "tstamp" : 1452861649548002, "ttl" : 604800, "expires_at" : 1453466449, "expired" : true } ] }, ... {code} it can be {code} [127.0.0.1-getWriteLatencyHisto]@19385620 Row[info=[ts=-9223372036854775808] ]: 694621867 | [value=00 ts=1452861829846001 ttl=604800 ldt=1453466629] [127.0.0.1-getWriteLatencyHisto]@19385686 Row[info=[ts=-9223372036854775808] ]: 694621927 | [value=00 ts=1452861769124000 ttl=604800 ldt=1453466569] [127.0.0.1-getWriteLatencyHisto]@19385708 Row[info=[ts=-9223372036854775808] ]: 694621987 | [value=00 ts=1452861709303002 ttl=604800 ldt=1453466509] [127.0.0.1-getWriteLatencyHisto]@19385730 Row[info=[ts=-9223372036854775808] ]: 694622047 | [value=00 ts=1452861649548002 ttl=604800 ldt=1453466449] ... {code} This would also have benefit for easily splitting files for hadoop jobs etc since it would have a cql row per line (easing wide partition issues with the compact output mentioned above). It would also tie the rendering to something already maintained for debug logging etc so little additional work for refactoring/storage changes. I am kinda a fan of both. So I implemented a {{-d}} (could use better name) option for the 1 row per line "debuggy" compact option (worth noting this is very hard to read if theres a lot of cells). Also added the current position from the scanner in the results (see above examples). Until CASSANDRA-9587 I had to add an alternative not to print out clustering key names in the toString since its not available anywhere which is a little hacky but can be removed once we have the names. > Replace sstable2json and json2sstable > - > > Key: CASSANDRA-7464 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7464 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Chris Lohfink >Priority: Minor > Fix For: 3.x > > Attachments: sstable-only.patch, sstabledump.patch > > > Both tools are pretty awful. They are primarily meant for debugging (there is > much more efficient and convenient ways to do import/export data), but their > output manage to be hard to handle both for humans and for tools (especially > as soon as you have modern stuff like composites). > There is value to having tools to export sstable contents into a format that > is easy to manipulate by human and tools for debugging, small hacks and > general tinkering, but sstable2json and json2sstable are not that. > So I propose that we deprecate those tools and consider writing better > replacements. It shouldn't be too hard to come up with an output format that > is more aware of modern concepts like composites, UDTs, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable
[ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091990#comment-15091990 ] Jonathan Ellis commented on CASSANDRA-7464: --- bq. my personal preference would be to stop calling that sstable2json, but rather have a tool whose purpose is to inspect sstables, since that's what people use it for in the first place anyway. That tool could then ultimately output multiple formats, json being only one of them and we could have something more readable otherwise (and could very well have some more useful informations like file offsets and such which could help a lot when debugging I don't see a point in creating an ad-hoc format along with json. There's no reason we couldn't include file offsets in json. bq. I'd also personally prefer not re-adding json2sstable. I think that tool is a lot less justified since you can easily create sstable (from whatever you want) using CQLSSTableWriter. bq. How about sstabledump? +1 > Replace sstable2json and json2sstable > - > > Key: CASSANDRA-7464 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7464 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Chris Lohfink >Priority: Minor > Fix For: 3.x > > Attachments: sstable-only.patch > > > Both tools are pretty awful. They are primarily meant for debugging (there is > much more efficient and convenient ways to do import/export data), but their > output manage to be hard to handle both for humans and for tools (especially > as soon as you have modern stuff like composites). > There is value to having tools to export sstable contents into a format that > is easy to manipulate by human and tools for debugging, small hacks and > general tinkering, but sstable2json and json2sstable are not that. > So I propose that we deprecate those tools and consider writing better > replacements. It shouldn't be too hard to come up with an output format that > is more aware of modern concepts like composites, UDTs, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable
[ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092023#comment-15092023 ] Sylvain Lebresne commented on CASSANDRA-7464: - bq. I don't see a point in creating an ad-hoc format along with json If I'm just "debugging" a sstable (because, say, a user as a bug that I'm investigating), i.e. I'm interested in checking the output manually, not processing it, then I personally very much prefer something like: {noformat} ["e", "f", "g", "h"](ts=1451330118497426) : [value=1](ts=1451330118497426), [other_col='foo'] ["a", "b", "c", "d"](ts=1451330118479576) : [value=3], [other_col='bar'](ttl=3) {noformat} to {noformat} [ { "partition" : { "key" : [ "e", "f" ] }, "rows" : [ { "type" : "row", "clustering" : [ "g", "h" ], "liveness_info" : { "tstamp" : 1451330118497426 }, "cells" : [ { "name" : "value", "value" : "2" } ] } ] }, { "partition" : { "key" : [ "a", "b" ] }, "rows" : [ { "type" : "row", "clustering" : [ "c", "d" ], "liveness_info" : { "tstamp" : 1451330118479576 }, "cells" : [ { "name" : "value", "value" : "1" } ] } ] } ] {noformat} And sure, that's probably only useful to devs and a few advanced and interested users, but I think having easy to use tools for that population is also important (keeping in mind that I don't pretend either that it's priority number one). > Replace sstable2json and json2sstable > - > > Key: CASSANDRA-7464 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7464 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Chris Lohfink >Priority: Minor > Fix For: 3.x > > Attachments: sstable-only.patch > > > Both tools are pretty awful. They are primarily meant for debugging (there is > much more efficient and convenient ways to do import/export data), but their > output manage to be hard to handle both for humans and for tools (especially > as soon as you have modern stuff like composites). > There is value to having tools to export sstable contents into a format that > is easy to manipulate by human and tools for debugging, small hacks and > general tinkering, but sstable2json and json2sstable are not that. > So I propose that we deprecate those tools and consider writing better > replacements. It shouldn't be too hard to come up with an output format that > is more aware of modern concepts like composites, UDTs, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable
[ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15087603#comment-15087603 ] Jonathan Ellis commented on CASSANDRA-7464: --- Chris, when would the "type" of a row not be "row?" Is that how you'd support static columns? Maybe that would be better as its own sub-object rather than a different type of row. > Replace sstable2json and json2sstable > - > > Key: CASSANDRA-7464 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7464 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Chris Lohfink >Priority: Minor > Fix For: 3.x > > Attachments: sstable-only.patch > > > Both tools are pretty awful. They are primarily meant for debugging (there is > much more efficient and convenient ways to do import/export data), but their > output manage to be hard to handle both for humans and for tools (especially > as soon as you have modern stuff like composites). > There is value to having tools to export sstable contents into a format that > is easy to manipulate by human and tools for debugging, small hacks and > general tinkering, but sstable2json and json2sstable are not that. > So I propose that we deprecate those tools and consider writing better > replacements. It shouldn't be too hard to come up with an output format that > is more aware of modern concepts like composites, UDTs, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable
[ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15087789#comment-15087789 ] Chris Lohfink commented on CASSANDRA-7464: -- I am good with no json2sstable, its a little non trivial to write. I could change to something like: ./bin/sstableexport json and add support for few other formats. > Replace sstable2json and json2sstable > - > > Key: CASSANDRA-7464 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7464 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Chris Lohfink >Priority: Minor > Fix For: 3.x > > Attachments: sstable-only.patch > > > Both tools are pretty awful. They are primarily meant for debugging (there is > much more efficient and convenient ways to do import/export data), but their > output manage to be hard to handle both for humans and for tools (especially > as soon as you have modern stuff like composites). > There is value to having tools to export sstable contents into a format that > is easy to manipulate by human and tools for debugging, small hacks and > general tinkering, but sstable2json and json2sstable are not that. > So I propose that we deprecate those tools and consider writing better > replacements. It shouldn't be too hard to come up with an output format that > is more aware of modern concepts like composites, UDTs, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable
[ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15087785#comment-15087785 ] Chris Lohfink commented on CASSANDRA-7464: -- A row can be a static row, also a range tombstone. > Replace sstable2json and json2sstable > - > > Key: CASSANDRA-7464 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7464 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Chris Lohfink >Priority: Minor > Fix For: 3.x > > Attachments: sstable-only.patch > > > Both tools are pretty awful. They are primarily meant for debugging (there is > much more efficient and convenient ways to do import/export data), but their > output manage to be hard to handle both for humans and for tools (especially > as soon as you have modern stuff like composites). > There is value to having tools to export sstable contents into a format that > is easy to manipulate by human and tools for debugging, small hacks and > general tinkering, but sstable2json and json2sstable are not that. > So I propose that we deprecate those tools and consider writing better > replacements. It shouldn't be too hard to come up with an output format that > is more aware of modern concepts like composites, UDTs, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable
[ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15087796#comment-15087796 ] Robert Stupp commented on CASSANDRA-7464: - How about {{sstabledump}}? Export implies to me that there's also something that can do the import. (bike shedding, i know) > Replace sstable2json and json2sstable > - > > Key: CASSANDRA-7464 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7464 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Chris Lohfink >Priority: Minor > Fix For: 3.x > > Attachments: sstable-only.patch > > > Both tools are pretty awful. They are primarily meant for debugging (there is > much more efficient and convenient ways to do import/export data), but their > output manage to be hard to handle both for humans and for tools (especially > as soon as you have modern stuff like composites). > There is value to having tools to export sstable contents into a format that > is easy to manipulate by human and tools for debugging, small hacks and > general tinkering, but sstable2json and json2sstable are not that. > So I propose that we deprecate those tools and consider writing better > replacements. It shouldn't be too hard to come up with an output format that > is more aware of modern concepts like composites, UDTs, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable
[ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074017#comment-15074017 ] Russell Bradberry commented on CASSANDRA-7464: -- Personally I would like to see an option to have an output method that is more digestible by scripts. The old sstable2json and currently this one, output the entire SSTable as a single array that is pretty-formatted. This is great for visually looking at it but requires the loading of an entire SSTable into memory before JSON parsing it. There are tools that attempt to read a large JSON stream and emit objects as they are complete, but these are rather cumbersome and difficult to use, also tend to be different form language to language. What I would propose is to have a command line option that will output one partition per line (escaping any newlines encountered) without any leading trailing brackets or commas. This will allow for an application to be able to read one partition at a time and work on it in a streaming fashion. I also put my thoughts on this in this github issue: https://github.com/tolbertam/sstable-tools/issues/19 > Replace sstable2json and json2sstable > - > > Key: CASSANDRA-7464 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7464 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Chris Lohfink >Priority: Minor > Fix For: 3.x > > Attachments: sstable-only.patch > > > Both tools are pretty awful. They are primarily meant for debugging (there is > much more efficient and convenient ways to do import/export data), but their > output manage to be hard to handle both for humans and for tools (especially > as soon as you have modern stuff like composites). > There is value to having tools to export sstable contents into a format that > is easy to manipulate by human and tools for debugging, small hacks and > general tinkering, but sstable2json and json2sstable are not that. > So I propose that we deprecate those tools and consider writing better > replacements. It shouldn't be too hard to come up with an output format that > is more aware of modern concepts like composites, UDTs, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable
[ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066841#comment-15066841 ] Andy Tolbert commented on CASSANDRA-7464: - [~JoshuaMcKenzie], we'd definitely both be interested and willing :). I don't think it would be too big of an effort to get it working with C*. The only non-cli/logging dependency is jackson, which C* already depends on (albeit an older version) so it shouldn't be too much effort. We took a best effort at coming up with an output format that we thought would be human readable and familiar to those who previously used sstable2json, but definitely would be welcome to feedback. > Replace sstable2json and json2sstable > - > > Key: CASSANDRA-7464 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7464 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Priority: Minor > Fix For: 3.x > > > Both tools are pretty awful. They are primarily meant for debugging (there is > much more efficient and convenient ways to do import/export data), but their > output manage to be hard to handle both for humans and for tools (especially > as soon as you have modern stuff like composites). > There is value to having tools to export sstable contents into a format that > is easy to manipulate by human and tools for debugging, small hacks and > general tinkering, but sstable2json and json2sstable are not that. > So I propose that we deprecate those tools and consider writing better > replacements. It shouldn't be too hard to come up with an output format that > is more aware of modern concepts like composites, UDTs, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable
[ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066699#comment-15066699 ] Joshua McKenzie commented on CASSANDRA-7464: [~cnlwsu] / [~andrew.tolbert]: How much work would it be to get a compatible version of your sstable2json with the official C* repo, assuming you're interested/willing? Plenty of us in the community would be more than happy to review / provide feedback on integration. > Replace sstable2json and json2sstable > - > > Key: CASSANDRA-7464 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7464 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Priority: Minor > Fix For: 3.x > > > Both tools are pretty awful. They are primarily meant for debugging (there is > much more efficient and convenient ways to do import/export data), but their > output manage to be hard to handle both for humans and for tools (especially > as soon as you have modern stuff like composites). > There is value to having tools to export sstable contents into a format that > is easy to manipulate by human and tools for debugging, small hacks and > general tinkering, but sstable2json and json2sstable are not that. > So I propose that we deprecate those tools and consider writing better > replacements. It shouldn't be too hard to come up with an output format that > is more aware of modern concepts like composites, UDTs, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable
[ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063308#comment-15063308 ] Chris Lohfink commented on CASSANDRA-7464: -- In meantime for people wanting a sstable2json tool, [~andrew.tolbert] and I have a version here: https://github.com/tolbertam/sstable-tools that supports the 3.x versions currently. bq. A key differentiator between the storage format between older verisons of Cassandra and Cassandra 3.0 is that an SSTable was previously a representation of partitions and their cells (identified by their clustering and column name) whereas with Cassandra 3.0 an SSTable now represents partitions and their rows. You can read about these changes in more detail by visiting this blog post. Additional improvements over the sstable2json tool includes no longer requiring the cassandra.yaml in classpath with the schema of the sstables loaded. Also by running in client mode this tool will not write to system tables or your commit log. It can safely be run as any user anywhere with no side effects. Its a little easier to run then older version as well. We are using this place as a playground but it may be a good starter if updating tool in C* as well. > Replace sstable2json and json2sstable > - > > Key: CASSANDRA-7464 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7464 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Priority: Minor > Fix For: 3.x > > > Both tools are pretty awful. They are primarily meant for debugging (there is > much more efficient and convenient ways to do import/export data), but their > output manage to be hard to handle both for humans and for tools (especially > as soon as you have modern stuff like composites). > There is value to having tools to export sstable contents into a format that > is easy to manipulate by human and tools for debugging, small hacks and > general tinkering, but sstable2json and json2sstable are not that. > So I propose that we deprecate those tools and consider writing better > replacements. It shouldn't be too hard to come up with an output format that > is more aware of modern concepts like composites, UDTs, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable
[ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038339#comment-15038339 ] Russell Bradberry commented on CASSANDRA-7464: -- It is absolutely insane that a perfectly working, albeit not the greatest, troubleshooting tool was removed and not replaced with anything. We now have no way at all to look into the SSTables. This makes troubleshooting production problems incredibly difficult. I am curious as to why enough consideration wasn't given to hold off the removal of the tool until the new one was ready to go. > Replace sstable2json and json2sstable > - > > Key: CASSANDRA-7464 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7464 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Priority: Minor > Fix For: 3.x > > > Both tools are pretty awful. They are primarily meant for debugging (there is > much more efficient and convenient ways to do import/export data), but their > output manage to be hard to handle both for humans and for tools (especially > as soon as you have modern stuff like composites). > There is value to having tools to export sstable contents into a format that > is easy to manipulate by human and tools for debugging, small hacks and > general tinkering, but sstable2json and json2sstable are not that. > So I propose that we deprecate those tools and consider writing better > replacements. It shouldn't be too hard to come up with an output format that > is more aware of modern concepts like composites, UDTs, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable
[ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038529#comment-15038529 ] Sylvain Lebresne commented on CASSANDRA-7464: - bq. It is absolutely insane that a perfectly working, albeit not the greatest, troubleshooting tool was removed and not replaced with anything. I think there is a bit of confusion on what happened here and that's possibly due to the phrasing of the description so I apologize for that. Those tools were more or less exposing the sstable layout. 3.0 has completely and profoundly changed that layout so those haven't been removed, their code has been almost entirely invalidated by the changes made in 3.0 and we have to re-implement them pretty much from scratch (we can salvage a couple lines of code to deal with json but that's not really the problem). Which imply figuring a decent way to expose the new layout (in json or otherwise) and I'm suggesting we might as well try to do a better job than what we did for the previous layout. The only choice we did made is to not delay the 3.0 release until we had time to deal with that rewrite because we figured some users may be fine starting to use 3.0 without this (at least in development/for testing since let's be frank here, few people will go in production with 3.x before probably 3.3/3.4 if not later). I'm personally comfortable this wasn't an unreasonable choice. But please be assured we haven't forgotten about this. It's just that we are pretty early after the 3.0 release and we're somewhat prioritizing fixing our known bugs (and fixing our damn dtests) before re-adding this. Which, here again, I don't think is completely unreasonable. Hopefully, we'll soon have fixed our most pressing bugs and will be able to devote resources to this. But if you can't wait, this is open source ... :) > Replace sstable2json and json2sstable > - > > Key: CASSANDRA-7464 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7464 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Priority: Minor > Fix For: 3.x > > > Both tools are pretty awful. They are primarily meant for debugging (there is > much more efficient and convenient ways to do import/export data), but their > output manage to be hard to handle both for humans and for tools (especially > as soon as you have modern stuff like composites). > There is value to having tools to export sstable contents into a format that > is easy to manipulate by human and tools for debugging, small hacks and > general tinkering, but sstable2json and json2sstable are not that. > So I propose that we deprecate those tools and consider writing better > replacements. It shouldn't be too hard to come up with an output format that > is more aware of modern concepts like composites, UDTs, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable
[ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013861#comment-15013861 ] Jeremy Hanna commented on CASSANDRA-7464: - I'm +1 on adding something to debug sstables as well. People also used the two tools for data migrations and stripping out unwanted rows from sstables. For the latter, it was for removing large partitions that couldn't be compacted. > Replace sstable2json and json2sstable > - > > Key: CASSANDRA-7464 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7464 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Priority: Minor > Fix For: 3.x > > > Both tools are pretty awful. They are primarily meant for debugging (there is > much more efficient and convenient ways to do import/export data), but their > output manage to be hard to handle both for humans and for tools (especially > as soon as you have modern stuff like composites). > There is value to having tools to export sstable contents into a format that > is easy to manipulate by human and tools for debugging, small hacks and > general tinkering, but sstable2json and json2sstable are not that. > So I propose that we deprecate those tools and consider writing better > replacements. It shouldn't be too hard to come up with an output format that > is more aware of modern concepts like composites, UDTs, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable
[ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013959#comment-15013959 ] Sebastian Estevez commented on CASSANDRA-7464: -- For general understanding of how SSTables and compaction work, sstable2json is invaluable. 3.0 dropped last week this issue has not been prioritized. Here's a couple of great posts that only exist because the community had the tooling to introspect sstables: http://www.planetcassandra.org/blog/qa-starters-guide-to-cassandra/ http://thelastpickle.com/blog/2011/05/15/Deletes-and-Tombstones.html folks also used cassandra-cli for this which I think we also deprecated: http://opensourceconnections.com/blog/2013/07/24/understanding-how-cql3-maps-to-cassandras-internal-data-structure/ There's lots more where these came from, just wanted to show some good examples of why the tools are useful. +1 on this Jira. > Replace sstable2json and json2sstable > - > > Key: CASSANDRA-7464 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7464 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Priority: Minor > Fix For: 3.x > > > Both tools are pretty awful. They are primarily meant for debugging (there is > much more efficient and convenient ways to do import/export data), but their > output manage to be hard to handle both for humans and for tools (especially > as soon as you have modern stuff like composites). > There is value to having tools to export sstable contents into a format that > is easy to manipulate by human and tools for debugging, small hacks and > general tinkering, but sstable2json and json2sstable are not that. > So I propose that we deprecate those tools and consider writing better > replacements. It shouldn't be too hard to come up with an output format that > is more aware of modern concepts like composites, UDTs, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable
[ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013968#comment-15013968 ] Wei Deng commented on CASSANDRA-7464: - If we didn't have sstablej2son tool, it would have been really really difficult to troubleshoot CASSANDRA-7953 in production. > Replace sstable2json and json2sstable > - > > Key: CASSANDRA-7464 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7464 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Priority: Minor > Fix For: 3.x > > > Both tools are pretty awful. They are primarily meant for debugging (there is > much more efficient and convenient ways to do import/export data), but their > output manage to be hard to handle both for humans and for tools (especially > as soon as you have modern stuff like composites). > There is value to having tools to export sstable contents into a format that > is easy to manipulate by human and tools for debugging, small hacks and > general tinkering, but sstable2json and json2sstable are not that. > So I propose that we deprecate those tools and consider writing better > replacements. It shouldn't be too hard to come up with an output format that > is more aware of modern concepts like composites, UDTs, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7464) Replace sstable2json and json2sstable
[ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011436#comment-15011436 ] Jeremiah Jordan commented on CASSANDRA-7464: So we removed these with CASSANDRA-9618 and never replaced them. > Replace sstable2json and json2sstable > - > > Key: CASSANDRA-7464 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7464 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Priority: Minor > Fix For: 3.x > > > Both tools are pretty awful. They are primarily meant for debugging (there is > much more efficient and convenient ways to do import/export data), but their > output manage to be hard to handle both for humans and for tools (especially > as soon as you have modern stuff like composites). > There is value to having tools to export sstable contents into a format that > is easy to manipulate by human and tools for debugging, small hacks and > general tinkering, but sstable2json and json2sstable are not that. > So I propose that we deprecate those tools and consider writing better > replacements. It shouldn't be too hard to come up with an output format that > is more aware of modern concepts like composites, UDTs, -- This message was sent by Atlassian JIRA (v6.3.4#6332)