[jira] [Commented] (CASSANDRA-14247) SASI tokenizer for simple delimiter based entries
[ https://issues.apache.org/jira/browse/CASSANDRA-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374981#comment-16374981 ] Michael Kjellman commented on CASSANDRA-14247: -- i'll try to take a look shortly > SASI tokenizer for simple delimiter based entries > - > > Key: CASSANDRA-14247 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14247 > Project: Cassandra > Issue Type: Improvement > Components: sasi >Reporter: mck >Assignee: mck >Priority: Major > Fix For: 4.0, 3.11.x > > > Currently SASI offers only two tokenizer options: > - NonTokenizerAnalyser > - StandardAnalyzer > The latter is built upon Snowball, powerful for human languages but overkill > for simple tokenization. > A simple tokenizer is proposed here. The need for this arose as a workaround > of CASSANDRA-11182, and to avoid the disk usage explosion when having to > resort to {{CONTAINS}}. See https://github.com/openzipkin/zipkin/issues/1861 > Example use of this would be: > {code} > CREATE CUSTOM INDEX span_annotation_query_idx > ON zipkin2.span (annotation_query) USING > 'org.apache.cassandra.index.sasi.SASIIndex' > WITH OPTIONS = { > 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.DelimiterTokenizerAnalyzer', > 'delimiter': '░', > 'case_sensitive': 'true', > 'mode': 'prefix', > 'analyzed': 'true'}; > {code} > Original credit for this work goes to https://github.com/zuochangan -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14247) SASI tokenizer for simple delimiter based entries
[ https://issues.apache.org/jira/browse/CASSANDRA-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374868#comment-16374868 ] Pavel Yaskevich commented on CASSANDRA-14247: - LGTM, but I think [~mkjellman] should take a look as well. > SASI tokenizer for simple delimiter based entries > - > > Key: CASSANDRA-14247 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14247 > Project: Cassandra > Issue Type: Improvement > Components: sasi >Reporter: mck >Assignee: mck >Priority: Major > Fix For: 4.0, 3.11.x > > > Currently SASI offers only two tokenizer options: > - NonTokenizerAnalyser > - StandardAnalyzer > The latter is built upon Snowball, powerful for human languages but overkill > for simple tokenization. > A simple tokenizer is proposed here. The need for this arose as a workaround > of CASSANDRA-11182, and to avoid the disk usage explosion when having to > resort to {{CONTAINS}}. See https://github.com/openzipkin/zipkin/issues/1861 > Example use of this would be: > {code} > CREATE CUSTOM INDEX span_annotation_query_idx > ON zipkin2.span (annotation_query) USING > 'org.apache.cassandra.index.sasi.SASIIndex' > WITH OPTIONS = { > 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.DelimiterTokenizerAnalyzer', > 'delimiter': '░', > 'case_sensitive': 'true', > 'mode': 'prefix', > 'analyzed': 'true'}; > {code} > Original credit for this work goes to https://github.com/zuochangan -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS
[ https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374864#comment-16374864 ] Jon Haddad commented on CASSANDRA-8460: --- Hey [~Lerh Low]! First off, let me thank you for being open to alternative ideas, especially after writing a large chunk of code. Not everyone is willing to take a step back and consider other options, I really appreciate it. {quote} Maybe you have stumbled upon the case where data has been resurrected in JBOD configuration in your experiences...? In theory since splitting by token range there should be no more such cases. It is safe. {quote} I had actually misremembered how CASSANDRA-6696 was implemented. Looking back at the code and testing it manually I see the memtables are flushed to their respective disks initially. It's nice to be wrong about this. There's quite a bit going on here, I did a quick search but didn't see anything related to disk failure policy. One thing that's going to be a bit tricky is unless you have a 1:1 fast disk to archive disk relationship, you end up with some weird situations that can show up when using {{disk_failure_policy: best_effort}}, which is what CASSANDRA-6696 was all about in the first place. If you lose your fast disk, will you still be able to query data that's on the archive disk for a given token range? It seems to me that using this feature would have to imply {{disk_failure_policy: stop}}, since either the failure of the archive or one of the disks in {{data_file_directories}} would result in incorrect results being returned. lvmcache uses [dm-cache|https://www.kernel.org/doc/Documentation/device-mapper/cache.txt] under the hood which keeps hot pages in memory. It shipped in Linux kernel 3.9, which was released in April 2013. Using lvmcache, if you were to create a logical volume per disk, with the SSD as your fast disk configured as a writethrough, you'd still honor the disk failure policy in the case of an archival or SSD failure, as well as have the flexibility of keeping any hot data readily available and not explicitly needing to move it off to another device when it's still active. It adapts to your read and write patterns rather than requiring configuration. Take a look at the [man page|http://man7.org/linux/man-pages/man7/lvmcache.7.html], it's pretty awesome. > Make it possible to move non-compacting sstables to slow/big storage in DTCS > > > Key: CASSANDRA-8460 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8460 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Eriksson >Assignee: Lerh Chuan Low >Priority: Major > Labels: doc-impacting, dtcs > Fix For: 4.x > > > It would be nice if we could configure DTCS to have a set of extra data > directories where we move the sstables once they are older than > max_sstable_age_days. > This would enable users to have a quick, small SSD for hot, new data, and big > spinning disks for data that is rarely read and never compacted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14257) Add a separate Installing Cassandra section on the menu and move the content there
[ https://issues.apache.org/jira/browse/CASSANDRA-14257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Brotman updated CASSANDRA-14257: Summary: Add a separate Installing Cassandra section on the menu and move the content there (was: Add a seperate Installing Cassandra section on the menu and move the content there) > Add a separate Installing Cassandra section on the menu and move the content > there > -- > > Key: CASSANDRA-14257 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14257 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Kenneth Brotman >Priority: Major > > {color:#00}Above the top level menu entitled “Configuring Cassandra” > should be a top level menu title called “Installing Cassandra” and this web > page should be moved there: > {color}[{color:#ff}http://cassandra.apache.org/doc/latest/getting_started/installing.html{color}] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14257) Add a seperate Installing Cassandra section on the menu and move the content there
Kenneth Brotman created CASSANDRA-14257: --- Summary: Add a seperate Installing Cassandra section on the menu and move the content there Key: CASSANDRA-14257 URL: https://issues.apache.org/jira/browse/CASSANDRA-14257 Project: Cassandra Issue Type: Improvement Components: Documentation and Website Reporter: Kenneth Brotman {color:#00}Above the top level menu entitled “Configuring Cassandra” should be a top level menu title called “Installing Cassandra” and this web page should be moved there: {color}[{color:#ff}http://cassandra.apache.org/doc/latest/getting_started/installing.html{color}] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14256) Renaming Reporting Bugs and Contributions to just Reporting Bugs
Kenneth Brotman created CASSANDRA-14256: --- Summary: Renaming Reporting Bugs and Contributions to just Reporting Bugs Key: CASSANDRA-14256 URL: https://issues.apache.org/jira/browse/CASSANDRA-14256 Project: Cassandra Issue Type: Improvement Components: Documentation and Website Reporter: Kenneth Brotman The top level menu title and web page title of [http://cassandra.apache.org/doc/latest/bugs.html] should be rename from “Reporting Bugs and Contributions” to just Reporting Bugs. There is a separate section for “Contributing to Cassandra” already and it has the information on contributing over there. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14255) Moving the Configuring Cassandra web page
Kenneth Brotman created CASSANDRA-14255: --- Summary: Moving the Configuring Cassandra web page Key: CASSANDRA-14255 URL: https://issues.apache.org/jira/browse/CASSANDRA-14255 Project: Cassandra Issue Type: Improvement Components: Documentation and Website Reporter: Kenneth Brotman {color:#00}The web page called Configuring Cassandra at {color}[{color:#ff}http://cassandra.apache.org/doc/latest/getting_started/configuring.html{color}]{color:#00} should be moved from under the “Getting Started” menu item to under the “Configuring Cassandra” menu item.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374200#comment-16374200 ] Thomas Steinmaurer commented on CASSANDRA-13929: The following does not include the latest patches from Feb 22, but shows last 30d on a single node (m4.2xlarge, Xmx12G, CMS) out of our 9 node loadtest environment including various tests/patches we have applied. !cassandra_heapcpu_memleak_patching_test_30d.png|width=1280! * Blue line => AVG heap utilization * Orange line => AVG CPU utilization (not really related as usually compaction is overlaying anything else most likely) Following timelines in the chart: ||Timeframe||Deployment||Comment/Result|| |Jan 25 - Feb 1|Cassandra 3.11 public + Netty 4.0.55|(!) Heap utilization increase| |Feb 1 - Feb 6|Cassandra 3.11 public + Netty 4.0.55 + limiting Netty capacity per Thread|(!) Heap utilization increase| |Feb 6 - Feb 14|Cassandra 3.11 public + Netty 4.0.55 + my recycleHandle = null patch|(/) Heap utilization stable| |Feb 14 - Feb 23|Cassandra 3.11 public + Netty 4.0.55 + *without* recycleHandle = null patch + first [~jay.zhuang] patch from Feb 13|(/) Heap utilization stable, but slightly increased to previous| Very high-level (although from the field) compared to [~jay.zhuang] tests and benchmarks, but possibly useful for a decision process, hopefully being included in 3.11.3. Thanks guys! > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > cassandra_heapcpu_memleak_patching_test_30d.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png, memleak_heapdump_recyclerstack.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Steinmaurer updated CASSANDRA-13929: --- Attachment: cassandra_heapcpu_memleak_patching_test_30d.png > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > cassandra_heapcpu_memleak_patching_test_30d.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png, memleak_heapdump_recyclerstack.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374138#comment-16374138 ] Norman Maurer commented on CASSANDRA-13929: --- Just a general comment The recycler only makes sense to use if creating the object is considered very expensive and or if you create / destroy a lot of these very frequently. Which means usually thousands per second. So if this is not the case here I think it completely reasonable to not use the Recycler at all... As I have no idea really about the use-case I am just leave this here as general comment :) > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png, memleak_heapdump_recyclerstack.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Resolved] (CASSANDRA-14191) Bootstrap/Streaming fails with missing CompressionInfo
[ https://issues.apache.org/jira/browse/CASSANDRA-14191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mck resolved CASSANDRA-14191. - Resolution: Cannot Reproduce Closing this ticket as 'cannot reproduce', as i doubt more information on it will arise. If it does, or anyone has any thoughts or suspicions about it, please do re-open the ticket and speak up. > Bootstrap/Streaming fails with missing CompressionInfo > -- > > Key: CASSANDRA-14191 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14191 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: mck >Priority: Major > > Multiple attempts at bootstrapping a new node fail, with streaming failing > (either hanging or stopping the bootstrap node) always from the same node. > > The original node throws the following exception during the streaming process: > {noformat} > ERROR [STREAM-OUT-/10.83.74.236:47220] 2018-01-24 19:25:22,532 > StreamSession.java:512 - [Stream #90c1c8b0-013a-11e8-b5f0-9323de372ca2] > Streaming error occurred on session with peer X.X.X.X > java.lang.AssertionError: null > at > org.apache.cassandra.io.compress.CompressionMetadata$Chunk.(CompressionMetadata.java:473) > ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463] > at > org.apache.cassandra.io.compress.CompressionMetadata.getChunksForSections(CompressionMetadata.java:287) > ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463] > at > org.apache.cassandra.streaming.messages.FileMessageHeader$FileMessageHeaderSerializer.serialize(FileMessageHeader.java:172) > ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463] > at > org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:82) > ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463] > at > org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:49) > ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463] > at > org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41) > ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463] > at > org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45) > ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463] > at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:377) > ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463] > at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:349) > ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151] > {noformat} > The bootstrapping node's reaction to this failure is > {noformat} > ERROR [STREAM-IN-/10.83.74.234:7001] 2018-01-24 19:25:22,957 > StreamSession.java:512 - [Stream #90c1c8b0-013a-11e8-b5f0-9323de372ca2] > Streaming error occurred on session with peer X.X.X.X > java.io.EOFException: null > at java.io.DataInputStream.readInt(DataInputStream.java:392) > ~[na:1.8.0_151] > at > org.apache.cassandra.streaming.compress.CompressionInfo$CompressionInfoSerializer.deserialize(CompressionInfo.java:68) > ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463] > at > org.apache.cassandra.streaming.compress.CompressionInfo$CompressionInfoSerializer.deserialize(CompressionInfo.java:47) > ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463] > at > org.apache.cassandra.streaming.messages.FileMessageHeader$FileMessageHeaderSerializer.deserialize(FileMessageHeader.java:188) > ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463] > at > org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:42) > ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463] > at > org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:38) > ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463] > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:56) > ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463] > at > org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:276) > ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151] > {noformat} > Other observations: > - always the one node that fails, > - multiple bootstrap attempts (using different ec2 instances) all fail, > - the exception occurs to {{\-tmp-}} sstables that have no CompressionInfo > component, > - it's a different {{\-tmp-}} sstable each time, > - running either {{nodetool cleanup}} or {{nodetool scrub}} made no > difference, -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CASSANDRA-14247) SASI tokenizer for simple delimiter based entries
[ https://issues.apache.org/jira/browse/CASSANDRA-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374075#comment-16374075 ] mck commented on CASSANDRA-14247: - [~xedin], are you able to spare a review? > SASI tokenizer for simple delimiter based entries > - > > Key: CASSANDRA-14247 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14247 > Project: Cassandra > Issue Type: Improvement > Components: sasi >Reporter: mck >Assignee: mck >Priority: Major > Fix For: 4.0, 3.11.x > > > Currently SASI offers only two tokenizer options: > - NonTokenizerAnalyser > - StandardAnalyzer > The latter is built upon Snowball, powerful for human languages but overkill > for simple tokenization. > A simple tokenizer is proposed here. The need for this arose as a workaround > of CASSANDRA-11182, and to avoid the disk usage explosion when having to > resort to {{CONTAINS}}. See https://github.com/openzipkin/zipkin/issues/1861 > Example use of this would be: > {code} > CREATE CUSTOM INDEX span_annotation_query_idx > ON zipkin2.span (annotation_query) USING > 'org.apache.cassandra.index.sasi.SASIIndex' > WITH OPTIONS = { > 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.DelimiterTokenizerAnalyzer', > 'delimiter': '░', > 'case_sensitive': 'true', > 'mode': 'prefix', > 'analyzed': 'true'}; > {code} > Original credit for this work goes to https://github.com/zuochangan -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org