[jira] [Commented] (CASSANDRA-14247) SASI tokenizer for simple delimiter based entries

2018-02-23 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374981#comment-16374981
 ] 

Michael Kjellman commented on CASSANDRA-14247:
--

i'll try to take a look shortly

> SASI tokenizer for simple delimiter based entries
> -
>
> Key: CASSANDRA-14247
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14247
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: mck
>Assignee: mck
>Priority: Major
> Fix For: 4.0, 3.11.x
>
>
> Currently SASI offers only two tokenizer options:
>  - NonTokenizerAnalyser
>  - StandardAnalyzer
> The latter is built upon Snowball, powerful for human languages but overkill 
> for simple tokenization.
> A simple tokenizer is proposed here. The need for this arose as a workaround 
> of CASSANDRA-11182, and to avoid the disk usage explosion when having to 
> resort to {{CONTAINS}}. See https://github.com/openzipkin/zipkin/issues/1861
> Example use of this would be:
> {code}
> CREATE CUSTOM INDEX span_annotation_query_idx 
> ON zipkin2.span (annotation_query) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' 
> WITH OPTIONS = {
> 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.DelimiterTokenizerAnalyzer', 
> 'delimiter': '░',
> 'case_sensitive': 'true', 
> 'mode': 'prefix', 
> 'analyzed': 'true'};
> {code}
> Original credit for this work goes to https://github.com/zuochangan



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14247) SASI tokenizer for simple delimiter based entries

2018-02-23 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374868#comment-16374868
 ] 

Pavel Yaskevich commented on CASSANDRA-14247:
-

LGTM, but I think [~mkjellman] should take a look as well.

> SASI tokenizer for simple delimiter based entries
> -
>
> Key: CASSANDRA-14247
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14247
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: mck
>Assignee: mck
>Priority: Major
> Fix For: 4.0, 3.11.x
>
>
> Currently SASI offers only two tokenizer options:
>  - NonTokenizerAnalyser
>  - StandardAnalyzer
> The latter is built upon Snowball, powerful for human languages but overkill 
> for simple tokenization.
> A simple tokenizer is proposed here. The need for this arose as a workaround 
> of CASSANDRA-11182, and to avoid the disk usage explosion when having to 
> resort to {{CONTAINS}}. See https://github.com/openzipkin/zipkin/issues/1861
> Example use of this would be:
> {code}
> CREATE CUSTOM INDEX span_annotation_query_idx 
> ON zipkin2.span (annotation_query) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' 
> WITH OPTIONS = {
> 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.DelimiterTokenizerAnalyzer', 
> 'delimiter': '░',
> 'case_sensitive': 'true', 
> 'mode': 'prefix', 
> 'analyzed': 'true'};
> {code}
> Original credit for this work goes to https://github.com/zuochangan



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-02-23 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374864#comment-16374864
 ] 

Jon Haddad commented on CASSANDRA-8460:
---

Hey [~Lerh Low]!  

First off, let me thank you for being open to alternative ideas, especially 
after writing a large chunk of code.  Not everyone is willing to take a step 
back and consider other options, I really appreciate it.

{quote}
Maybe you have stumbled upon the case where data has been resurrected in JBOD 
configuration in your experiences...? In theory since splitting by token range 
there should be no more such cases. It is safe.
{quote}

I had actually misremembered how CASSANDRA-6696 was implemented.  Looking back 
at the code and testing it manually I see the memtables are flushed to their 
respective disks initially.  It's nice to be wrong about this.

There's quite a bit going on here, I did a quick search but didn't see anything 
related to disk failure policy.  One thing that's going to be a bit tricky is 
unless you have a 1:1 fast disk to archive disk relationship, you end up with 
some weird situations that can show up when using {{disk_failure_policy: 
best_effort}}, which is what CASSANDRA-6696 was all about in the first place.  
If you lose your fast disk, will you still be able to query data that's on the 
archive disk for a given token range?  

It seems to me that using this feature would have to imply 
{{disk_failure_policy: stop}}, since either the failure of the archive or one 
of the disks in {{data_file_directories}} would result in incorrect results 
being returned.

lvmcache uses 
[dm-cache|https://www.kernel.org/doc/Documentation/device-mapper/cache.txt] 
under the hood which keeps hot pages in memory.  It shipped in Linux kernel 
3.9, which was released in April 2013.  

Using lvmcache, if you were to create a logical volume per disk, with the SSD 
as your fast disk configured as a writethrough, you'd still honor the disk 
failure policy in the case of an archival or SSD failure, as well as have the 
flexibility of keeping any hot data readily available and not explicitly 
needing to move it off to another device when it's still active.  It adapts to 
your read and write patterns rather than requiring configuration.  Take a look 
at the [man page|http://man7.org/linux/man-pages/man7/lvmcache.7.html], it's 
pretty awesome.

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14257) Add a separate Installing Cassandra section on the menu and move the content there

2018-02-23 Thread Kenneth Brotman (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Brotman updated CASSANDRA-14257:

Summary: Add a separate Installing Cassandra section on the menu and move 
the content there  (was: Add a seperate Installing Cassandra section on the 
menu and move the content there)

> Add a separate Installing Cassandra section on the menu and move the content 
> there
> --
>
> Key: CASSANDRA-14257
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14257
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Kenneth Brotman
>Priority: Major
>
> {color:#00}Above the top level menu entitled “Configuring Cassandra” 
> should be a top level menu title called “Installing Cassandra” and this web 
> page should be moved there: 
> {color}[{color:#ff}http://cassandra.apache.org/doc/latest/getting_started/installing.html{color}]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14257) Add a seperate Installing Cassandra section on the menu and move the content there

2018-02-23 Thread Kenneth Brotman (JIRA)
Kenneth Brotman created CASSANDRA-14257:
---

 Summary: Add a seperate Installing Cassandra section on the menu 
and move the content there
 Key: CASSANDRA-14257
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14257
 Project: Cassandra
  Issue Type: Improvement
  Components: Documentation and Website
Reporter: Kenneth Brotman


{color:#00}Above the top level menu entitled “Configuring Cassandra” should 
be a top level menu title called “Installing Cassandra” and this web page 
should be moved there: 
{color}[{color:#ff}http://cassandra.apache.org/doc/latest/getting_started/installing.html{color}]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14256) Renaming Reporting Bugs and Contributions to just Reporting Bugs

2018-02-23 Thread Kenneth Brotman (JIRA)
Kenneth Brotman created CASSANDRA-14256:
---

 Summary: Renaming Reporting Bugs and Contributions to just 
Reporting Bugs
 Key: CASSANDRA-14256
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14256
 Project: Cassandra
  Issue Type: Improvement
  Components: Documentation and Website
Reporter: Kenneth Brotman


The top level menu title and web page title of 
[http://cassandra.apache.org/doc/latest/bugs.html] should be rename from 
“Reporting Bugs and Contributions” to just Reporting Bugs. There is a separate 
section for “Contributing to Cassandra” already and it has the information on 
contributing over there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14255) Moving the Configuring Cassandra web page

2018-02-23 Thread Kenneth Brotman (JIRA)
Kenneth Brotman created CASSANDRA-14255:
---

 Summary: Moving the Configuring Cassandra web page
 Key: CASSANDRA-14255
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14255
 Project: Cassandra
  Issue Type: Improvement
  Components: Documentation and Website
Reporter: Kenneth Brotman


{color:#00}The web page called Configuring Cassandra at 
{color}[{color:#ff}http://cassandra.apache.org/doc/latest/getting_started/configuring.html{color}]{color:#00}
 should be moved from under the “Getting Started” menu item to under the 
“Configuring Cassandra” menu item.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-02-23 Thread Thomas Steinmaurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374200#comment-16374200
 ] 

Thomas Steinmaurer commented on CASSANDRA-13929:


The following does not include the latest patches from Feb 22, but shows last 
30d on a single node (m4.2xlarge, Xmx12G, CMS) out of our 9 node loadtest 
environment including various tests/patches we have applied.
 !cassandra_heapcpu_memleak_patching_test_30d.png|width=1280!
 * Blue line => AVG heap utilization
 * Orange line => AVG CPU utilization (not really related as usually compaction 
is overlaying anything else most likely)

Following timelines in the chart:
||Timeframe||Deployment||Comment/Result||
|Jan 25 - Feb 1|Cassandra 3.11 public + Netty 4.0.55|(!) Heap utilization 
increase|
|Feb 1 - Feb 6|Cassandra 3.11 public + Netty 4.0.55 + limiting Netty capacity 
per Thread|(!) Heap utilization increase|
|Feb 6 - Feb 14|Cassandra 3.11 public + Netty 4.0.55 + my recycleHandle = null 
patch|(/) Heap utilization stable|
|Feb 14 - Feb 23|Cassandra 3.11 public + Netty 4.0.55 + *without* recycleHandle 
= null patch + first [~jay.zhuang] patch from Feb 13|(/) Heap utilization 
stable, but slightly increased to previous|

Very high-level (although from the field) compared to [~jay.zhuang] tests and 
benchmarks, but possibly useful for a decision process, hopefully being 
included in 3.11.3. Thanks guys!

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> cassandra_heapcpu_memleak_patching_test_30d.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png, memleak_heapdump_recyclerstack.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-02-23 Thread Thomas Steinmaurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Steinmaurer updated CASSANDRA-13929:
---
Attachment: cassandra_heapcpu_memleak_patching_test_30d.png

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> cassandra_heapcpu_memleak_patching_test_30d.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png, memleak_heapdump_recyclerstack.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-02-23 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374138#comment-16374138
 ] 

Norman Maurer commented on CASSANDRA-13929:
---

Just a general comment The recycler only makes sense to use if creating the 
object is considered very expensive and or if you create / destroy a lot of 
these very frequently. Which means usually thousands per second.  So if this is 
not the case here I think it completely reasonable to not use the Recycler at 
all... As I have no idea really about the use-case I am just leave this here as 
general comment :)

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png, memleak_heapdump_recyclerstack.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Resolved] (CASSANDRA-14191) Bootstrap/Streaming fails with missing CompressionInfo

2018-02-23 Thread mck (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mck resolved CASSANDRA-14191.
-
Resolution: Cannot Reproduce

Closing this ticket as 'cannot reproduce', as i doubt more information on it 
will arise.

If it does, or anyone has any thoughts or suspicions about it, please do 
re-open the ticket and speak up.

> Bootstrap/Streaming fails with missing CompressionInfo
> --
>
> Key: CASSANDRA-14191
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14191
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: mck
>Priority: Major
>
> Multiple attempts at bootstrapping a new node fail, with streaming failing 
> (either hanging or stopping the bootstrap node) always from the same node.
>  
> The original node throws the following exception during the streaming process:
> {noformat}
> ERROR [STREAM-OUT-/10.83.74.236:47220] 2018-01-24 19:25:22,532 
> StreamSession.java:512 - [Stream #90c1c8b0-013a-11e8-b5f0-9323de372ca2] 
> Streaming error occurred on session with peer X.X.X.X
> java.lang.AssertionError: null
>   at 
> org.apache.cassandra.io.compress.CompressionMetadata$Chunk.(CompressionMetadata.java:473)
>  ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463]
>   at 
> org.apache.cassandra.io.compress.CompressionMetadata.getChunksForSections(CompressionMetadata.java:287)
>  ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463]
>   at 
> org.apache.cassandra.streaming.messages.FileMessageHeader$FileMessageHeaderSerializer.serialize(FileMessageHeader.java:172)
>  ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463]
>   at 
> org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:82)
>  ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463]
>   at 
> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:49)
>  ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463]
>   at 
> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41)
>  ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463]
>   at 
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
>  ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463]
>   at 
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:377)
>  ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463]
>   at 
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:349)
>  ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463]
>   at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]
> {noformat}
> The bootstrapping node's reaction to this failure is
> {noformat}
> ERROR [STREAM-IN-/10.83.74.234:7001] 2018-01-24 19:25:22,957 
> StreamSession.java:512 - [Stream #90c1c8b0-013a-11e8-b5f0-9323de372ca2] 
> Streaming error occurred on session with peer X.X.X.X
> java.io.EOFException: null
>   at java.io.DataInputStream.readInt(DataInputStream.java:392) 
> ~[na:1.8.0_151]
>   at 
> org.apache.cassandra.streaming.compress.CompressionInfo$CompressionInfoSerializer.deserialize(CompressionInfo.java:68)
>  ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463]
>   at 
> org.apache.cassandra.streaming.compress.CompressionInfo$CompressionInfoSerializer.deserialize(CompressionInfo.java:47)
>  ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463]
>   at 
> org.apache.cassandra.streaming.messages.FileMessageHeader$FileMessageHeaderSerializer.deserialize(FileMessageHeader.java:188)
>  ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463]
>   at 
> org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:42)
>  ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463]
>   at 
> org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:38)
>  ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463]
>   at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:56)
>  ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463]
>   at 
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:276)
>  ~[cassandra-all-2.1.18.1463.jar:2.1.18.1463]
>   at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]
> {noformat}
> Other observations:
>  - always the one node that fails,
>  - multiple bootstrap attempts (using different ec2 instances) all fail,
>  - the exception occurs to {{\-tmp-}} sstables that have no CompressionInfo 
> component,
>  - it's a different {{\-tmp-}} sstable each time,
>  - running either {{nodetool cleanup}} or {{nodetool scrub}} made no 
> difference,



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CASSANDRA-14247) SASI tokenizer for simple delimiter based entries

2018-02-23 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374075#comment-16374075
 ] 

mck commented on CASSANDRA-14247:
-

[~xedin], are you able to spare a review?

> SASI tokenizer for simple delimiter based entries
> -
>
> Key: CASSANDRA-14247
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14247
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: mck
>Assignee: mck
>Priority: Major
> Fix For: 4.0, 3.11.x
>
>
> Currently SASI offers only two tokenizer options:
>  - NonTokenizerAnalyser
>  - StandardAnalyzer
> The latter is built upon Snowball, powerful for human languages but overkill 
> for simple tokenization.
> A simple tokenizer is proposed here. The need for this arose as a workaround 
> of CASSANDRA-11182, and to avoid the disk usage explosion when having to 
> resort to {{CONTAINS}}. See https://github.com/openzipkin/zipkin/issues/1861
> Example use of this would be:
> {code}
> CREATE CUSTOM INDEX span_annotation_query_idx 
> ON zipkin2.span (annotation_query) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' 
> WITH OPTIONS = {
> 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.DelimiterTokenizerAnalyzer', 
> 'delimiter': '░',
> 'case_sensitive': 'true', 
> 'mode': 'prefix', 
> 'analyzed': 'true'};
> {code}
> Original credit for this work goes to https://github.com/zuochangan



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org