[jira] [Updated] (CASSANDRA-8623) sstablesplit fails *randomly* with Data component is missing

2015-12-02 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-8623:
---
Component/s: Tools

> sstablesplit fails *randomly* with Data component is missing
> 
>
> Key: CASSANDRA-8623
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8623
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Alan Boudreault
>Assignee: Marcus Eriksson
>  Labels: qa-resolved
> Fix For: 2.1.3
>
> Attachments: 
> 0001-make-sure-we-finish-compactions-before-waiting-for-d.patch, 
> 8623-v2.patch, output.log, output2.log
>
>
> I'm experiencing an issue related to sstablesplit. I would like to understand 
> if I am doing something wrong or there is an issue in the split process. The 
> process fails randomly with the following exception:
> {code}
> ERROR 02:17:36 Error in ThreadPoolExecutor
> java.lang.AssertionError: Data component is missing for 
> sstable./tools/bin/../../data/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-16
> {code}
> See attached output.log file. The process never stops after this exception 
> and I've also seen the dataset growing indefinitely (number of sstables).  
> * I have not been able to reproduce the issue with a single sstablesplit 
> command. ie, specifying all files with glob matching.
> * I can reproduce the bug if I call multiple sstablesplit one file at the 
> time (the way ccm does)
> Here is the test case file to reproduce the bug:
> https://drive.google.com/file/d/0BwZ_GPM33j6KdVh0NTdkOWV2R1E/view?usp=sharing
> 1. Download the split_issue.tar.gz file. It includes latest cassandra-2.1 
> branch binaries.
> 2. Extract it
> 3. CD inside the use case directory
> 4. Download the dataset (2G) just to be sure we have the same thing, and 
> place it in the working directory.
>https://docs.google.com/uc?id=0BwZ_GPM33j6KV3ViNnpPcVFndUU=download
> 5. The first time, run ./test.sh. This will setup and run a test.
> 6. The next times, you can only run ./test --no-setup . This will only reset 
> the dataset as its initial state and re-run the test. You might have to run 
> the tests some times before experiencing it... but I'm always able with only 
> 2-3 runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8623) sstablesplit fails *randomly* with Data component is missing

2015-04-23 Thread Alan Boudreault (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Boudreault updated CASSANDRA-8623:
---
Labels: qa-resolved  (was: )

 sstablesplit fails *randomly* with Data component is missing
 

 Key: CASSANDRA-8623
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8623
 Project: Cassandra
  Issue Type: Bug
Reporter: Alan Boudreault
Assignee: Marcus Eriksson
  Labels: qa-resolved
 Fix For: 2.1.3

 Attachments: 
 0001-make-sure-we-finish-compactions-before-waiting-for-d.patch, 
 8623-v2.patch, output.log, output2.log


 I'm experiencing an issue related to sstablesplit. I would like to understand 
 if I am doing something wrong or there is an issue in the split process. The 
 process fails randomly with the following exception:
 {code}
 ERROR 02:17:36 Error in ThreadPoolExecutor
 java.lang.AssertionError: Data component is missing for 
 sstable./tools/bin/../../data/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-16
 {code}
 See attached output.log file. The process never stops after this exception 
 and I've also seen the dataset growing indefinitely (number of sstables).  
 * I have not been able to reproduce the issue with a single sstablesplit 
 command. ie, specifying all files with glob matching.
 * I can reproduce the bug if I call multiple sstablesplit one file at the 
 time (the way ccm does)
 Here is the test case file to reproduce the bug:
 https://drive.google.com/file/d/0BwZ_GPM33j6KdVh0NTdkOWV2R1E/view?usp=sharing
 1. Download the split_issue.tar.gz file. It includes latest cassandra-2.1 
 branch binaries.
 2. Extract it
 3. CD inside the use case directory
 4. Download the dataset (2G) just to be sure we have the same thing, and 
 place it in the working directory.
https://docs.google.com/uc?id=0BwZ_GPM33j6KV3ViNnpPcVFndUUexport=download
 5. The first time, run ./test.sh. This will setup and run a test.
 6. The next times, you can only run ./test --no-setup . This will only reset 
 the dataset as its initial state and re-run the test. You might have to run 
 the tests some times before experiencing it... but I'm always able with only 
 2-3 runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8623) sstablesplit fails *randomly* with Data component is missing

2015-01-28 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-8623:
---
Reviewer: Yuki Morishita

 sstablesplit fails *randomly* with Data component is missing
 

 Key: CASSANDRA-8623
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8623
 Project: Cassandra
  Issue Type: Bug
Reporter: Alan Boudreault
Assignee: Marcus Eriksson
 Attachments: 
 0001-make-sure-we-finish-compactions-before-waiting-for-d.patch, 
 8623-v2.patch, output.log, output2.log


 I'm experiencing an issue related to sstablesplit. I would like to understand 
 if I am doing something wrong or there is an issue in the split process. The 
 process fails randomly with the following exception:
 {code}
 ERROR 02:17:36 Error in ThreadPoolExecutor
 java.lang.AssertionError: Data component is missing for 
 sstable./tools/bin/../../data/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-16
 {code}
 See attached output.log file. The process never stops after this exception 
 and I've also seen the dataset growing indefinitely (number of sstables).  
 * I have not been able to reproduce the issue with a single sstablesplit 
 command. ie, specifying all files with glob matching.
 * I can reproduce the bug if I call multiple sstablesplit one file at the 
 time (the way ccm does)
 Here is the test case file to reproduce the bug:
 https://drive.google.com/file/d/0BwZ_GPM33j6KdVh0NTdkOWV2R1E/view?usp=sharing
 1. Download the split_issue.tar.gz file. It includes latest cassandra-2.1 
 branch binaries.
 2. Extract it
 3. CD inside the use case directory
 4. Download the dataset (2G) just to be sure we have the same thing, and 
 place it in the working directory.
https://docs.google.com/uc?id=0BwZ_GPM33j6KV3ViNnpPcVFndUUexport=download
 5. The first time, run ./test.sh. This will setup and run a test.
 6. The next times, you can only run ./test --no-setup . This will only reset 
 the dataset as its initial state and re-run the test. You might have to run 
 the tests some times before experiencing it... but I'm always able with only 
 2-3 runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8623) sstablesplit fails *randomly* with Data component is missing

2015-01-27 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-8623:
---
Attachment: 8623-v2.patch

new patch that avoids submitting tasks if the executor is shut down

 sstablesplit fails *randomly* with Data component is missing
 

 Key: CASSANDRA-8623
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8623
 Project: Cassandra
  Issue Type: Bug
Reporter: Alan Boudreault
Assignee: Marcus Eriksson
 Attachments: 
 0001-make-sure-we-finish-compactions-before-waiting-for-d.patch, 
 8623-v2.patch, output.log, output2.log


 I'm experiencing an issue related to sstablesplit. I would like to understand 
 if I am doing something wrong or there is an issue in the split process. The 
 process fails randomly with the following exception:
 {code}
 ERROR 02:17:36 Error in ThreadPoolExecutor
 java.lang.AssertionError: Data component is missing for 
 sstable./tools/bin/../../data/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-16
 {code}
 See attached output.log file. The process never stops after this exception 
 and I've also seen the dataset growing indefinitely (number of sstables).  
 * I have not been able to reproduce the issue with a single sstablesplit 
 command. ie, specifying all files with glob matching.
 * I can reproduce the bug if I call multiple sstablesplit one file at the 
 time (the way ccm does)
 Here is the test case file to reproduce the bug:
 https://drive.google.com/file/d/0BwZ_GPM33j6KdVh0NTdkOWV2R1E/view?usp=sharing
 1. Download the split_issue.tar.gz file. It includes latest cassandra-2.1 
 branch binaries.
 2. Extract it
 3. CD inside the use case directory
 4. Download the dataset (2G) just to be sure we have the same thing, and 
 place it in the working directory.
https://docs.google.com/uc?id=0BwZ_GPM33j6KV3ViNnpPcVFndUUexport=download
 5. The first time, run ./test.sh. This will setup and run a test.
 6. The next times, you can only run ./test --no-setup . This will only reset 
 the dataset as its initial state and re-run the test. You might have to run 
 the tests some times before experiencing it... but I'm always able with only 
 2-3 runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8623) sstablesplit fails *randomly* with Data component is missing

2015-01-26 Thread Alan Boudreault (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Boudreault updated CASSANDRA-8623:
---
Attachment: output2.log

[~krummas] Tried the patch. I'm now getting this exception: 

{code}
ERROR 18:23:48 Error in ThreadPoolExecutor
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut 
down
at 
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:61)
 ~[main/:na]
at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) 
~[na:1.7.0_72]
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) 
~[na:1.7.0_72]
at 
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.execute(DebuggableThreadPoolExecutor.java:150)
 ~[main/:na]
at 
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110)
 ~[na:1.7.0_72]
at 
org.apache.cassandra.db.compaction.CompactionManager.submitBackground(CompactionManager.java:183)
 ~[main/:na]
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:239)
 ~[main/:na]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
~[na:1.7.0_72]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
~[na:1.7.0_72]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
~[na:1.7.0_72]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_72]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_72]
{code}

See output2.log for more information.

 sstablesplit fails *randomly* with Data component is missing
 

 Key: CASSANDRA-8623
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8623
 Project: Cassandra
  Issue Type: Bug
Reporter: Alan Boudreault
Assignee: Marcus Eriksson
 Attachments: 
 0001-make-sure-we-finish-compactions-before-waiting-for-d.patch, output.log, 
 output2.log


 I'm experiencing an issue related to sstablesplit. I would like to understand 
 if I am doing something wrong or there is an issue in the split process. The 
 process fails randomly with the following exception:
 {code}
 ERROR 02:17:36 Error in ThreadPoolExecutor
 java.lang.AssertionError: Data component is missing for 
 sstable./tools/bin/../../data/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-16
 {code}
 See attached output.log file. The process never stops after this exception 
 and I've also seen the dataset growing indefinitely (number of sstables).  
 * I have not been able to reproduce the issue with a single sstablesplit 
 command. ie, specifying all files with glob matching.
 * I can reproduce the bug if I call multiple sstablesplit one file at the 
 time (the way ccm does)
 Here is the test case file to reproduce the bug:
 https://drive.google.com/file/d/0BwZ_GPM33j6KdVh0NTdkOWV2R1E/view?usp=sharing
 1. Download the split_issue.tar.gz file. It includes latest cassandra-2.1 
 branch binaries.
 2. Extract it
 3. CD inside the use case directory
 4. Download the dataset (2G) just to be sure we have the same thing, and 
 place it in the working directory.
https://docs.google.com/uc?id=0BwZ_GPM33j6KV3ViNnpPcVFndUUexport=download
 5. The first time, run ./test.sh. This will setup and run a test.
 6. The next times, you can only run ./test --no-setup . This will only reset 
 the dataset as its initial state and re-run the test. You might have to run 
 the tests some times before experiencing it... but I'm always able with only 
 2-3 runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8623) sstablesplit fails *randomly* with Data component is missing

2015-01-25 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-8623:
---
Attachment: 0001-make-sure-we-finish-compactions-before-waiting-for-d.patch

[~aboudreault] could you test this patch? I have not been able to reproduce 
this myself

Patch just makes sure all compactions are finished before waiting for the 
deletion tasks

Same issue could be in offline cleanup and offline scrub

 sstablesplit fails *randomly* with Data component is missing
 

 Key: CASSANDRA-8623
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8623
 Project: Cassandra
  Issue Type: Bug
Reporter: Alan Boudreault
Assignee: Marcus Eriksson
 Attachments: 
 0001-make-sure-we-finish-compactions-before-waiting-for-d.patch, output.log


 I'm experiencing an issue related to sstablesplit. I would like to understand 
 if I am doing something wrong or there is an issue in the split process. The 
 process fails randomly with the following exception:
 {code}
 ERROR 02:17:36 Error in ThreadPoolExecutor
 java.lang.AssertionError: Data component is missing for 
 sstable./tools/bin/../../data/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-16
 {code}
 See attached output.log file. The process never stops after this exception 
 and I've also seen the dataset growing indefinitely (number of sstables).  
 * I have not been able to reproduce the issue with a single sstablesplit 
 command. ie, specifying all files with glob matching.
 * I can reproduce the bug if I call multiple sstablesplit one file at the 
 time (the way ccm does)
 Here is the test case file to reproduce the bug:
 https://drive.google.com/file/d/0BwZ_GPM33j6KdVh0NTdkOWV2R1E/view?usp=sharing
 1. Download the split_issue.tar.gz file. It includes latest cassandra-2.1 
 branch binaries.
 2. Extract it
 3. CD inside the use case directory
 4. Download the dataset (2G) just to be sure we have the same thing, and 
 place it in the working directory.
https://docs.google.com/uc?id=0BwZ_GPM33j6KV3ViNnpPcVFndUUexport=download
 5. The first time, run ./test.sh. This will setup and run a test.
 6. The next times, you can only run ./test --no-setup . This will only reset 
 the dataset as its initial state and re-run the test. You might have to run 
 the tests some times before experiencing it... but I'm always able with only 
 2-3 runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8623) sstablesplit fails *randomly* with Data component is missing

2015-01-14 Thread Alan Boudreault (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Boudreault updated CASSANDRA-8623:
---
Description: 
I'm experiencing an issue related to sstablesplit. I would like to understand 
if I am doing something wrong or there is an issue in the split process. The 
process fails randomly with the following exception:
{code}
ERROR 02:17:36 Error in ThreadPoolExecutor
java.lang.AssertionError: Data component is missing for 
sstable./tools/bin/../../data/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-16
{code}

See attached output.log file. The process never stops after this exception and 
I've also seen the dataset growing indefinitely (number of sstables).  

* I have not been able to reproduce the issue with a single sstablesplit 
command. ie, specifying all files with glob matching.
* I can reproduce the bug if I call multiple sstablesplit one file at the time 
(the way ccm does)

Here is the test case file to reproduce the bug:

https://drive.google.com/file/d/0BwZ_GPM33j6KdVh0NTdkOWV2R1E/view?usp=sharing

1. Download the split_issue.tar.gz file. It includes latest cassandra-2.1 
branch binaries.
2. Extract it
3. CD inside the use case directory
4. Download the dataset (2G) just to be sure we have the same thing, and place 
it in the working directory.
   https://docs.google.com/uc?id=0BwZ_GPM33j6KV3ViNnpPcVFndUUexport=download
5. The first time, run ./test.sh. This will setup and run a test.
6. The next times, you can only run ./test --no-setup . This will only reset 
the dataset as its initial state and re-run the test. You might have to run the 
tests some times before experiencing it... but I'm always able with only 2-3 
runs.


  was:
I'm experiencing an issue related to sstablesplit. I would like to understand 
if I am doing something wrong or there is an issue in the split process. The 
process fails randomly with the following exception:
{code}
ERROR 02:17:36 Error in ThreadPoolExecutor
java.lang.AssertionError: Data component is missing for 
sstable./tools/bin/../../data/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-16
{code}

See attached output.log file. The process never stops after this exception and 
I've also seen the dataset growing indefinitely (number of sstables).  

* I have not been able to reproduce the issue with a single sstablesplit 
command. ie, specifying all files with glob matching.
* I can reproduce the bug if I call multiple sstablesplit on a single file (the 
way ccm does)

Here is the test case file to reproduce the bug:

https://drive.google.com/file/d/0BwZ_GPM33j6KdVh0NTdkOWV2R1E/view?usp=sharing

1. Download the split_issue.tar.gz file. It includes latest cassandra-2.1 
branch binaries.
2. Extract it
3. CD inside the use case directory
4. Download the dataset (2G) just to be sure we have the same thing, and place 
it in the working directory.
   https://docs.google.com/uc?id=0BwZ_GPM33j6KV3ViNnpPcVFndUUexport=download
5. The first time, run ./test.sh. This will setup and run a test.
6. The next times, you can only run ./test --no-setup . This will only reset 
the dataset as its initial state and re-run the test. You might have to run the 
tests some times before experiencing it... but I'm always able with only 2-3 
runs.



 sstablesplit fails *randomly* with Data component is missing
 

 Key: CASSANDRA-8623
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8623
 Project: Cassandra
  Issue Type: Bug
Reporter: Alan Boudreault
Assignee: Marcus Eriksson
 Attachments: output.log


 I'm experiencing an issue related to sstablesplit. I would like to understand 
 if I am doing something wrong or there is an issue in the split process. The 
 process fails randomly with the following exception:
 {code}
 ERROR 02:17:36 Error in ThreadPoolExecutor
 java.lang.AssertionError: Data component is missing for 
 sstable./tools/bin/../../data/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-16
 {code}
 See attached output.log file. The process never stops after this exception 
 and I've also seen the dataset growing indefinitely (number of sstables).  
 * I have not been able to reproduce the issue with a single sstablesplit 
 command. ie, specifying all files with glob matching.
 * I can reproduce the bug if I call multiple sstablesplit one file at the 
 time (the way ccm does)
 Here is the test case file to reproduce the bug:
 https://drive.google.com/file/d/0BwZ_GPM33j6KdVh0NTdkOWV2R1E/view?usp=sharing
 1. Download the split_issue.tar.gz file. It includes latest cassandra-2.1 
 branch binaries.
 2. Extract it
 3. CD inside the use case directory
 4. 

[jira] [Updated] (CASSANDRA-8623) sstablesplit fails *randomly* with Data component is missing

2015-01-14 Thread Alan Boudreault (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Boudreault updated CASSANDRA-8623:
---
Description: 
I'm experiencing an issue related to sstablesplit. I would like to understand 
if I am doing something wrong or there is an issue in the split process. The 
process fails randomly with the following exception:
{code}
ERROR 02:17:36 Error in ThreadPoolExecutor
java.lang.AssertionError: Data component is missing for 
sstable./tools/bin/../../data/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-16
{code}

See attached output.log file. The process never stops after this exception and 
I've also seen the dataset growing indefinitely (number of sstables).  

* I have not been able to reproduce the issue with a single sstablesplit 
command. ie, specifying all files with glob matching.
* I can reproduce the if I call multiple sstablesplit on a single file (the way 
ccm does)

Here is the test case file to reproduce the bug:

https://drive.google.com/file/d/0BwZ_GPM33j6KdVh0NTdkOWV2R1E/view?usp=sharing

1. Download the split_issue.tar.gz file. It includes latest cassandra-2.1 
branch binaries.
2. Extract it
3. CD inside the use case directory
4. Download the dataset (2G) just to be sure we have the same thing, and place 
it in the working directory.
   https://docs.google.com/uc?id=0BwZ_GPM33j6KV3ViNnpPcVFndUUexport=download
5. The first time, run ./test.sh. This will setup and run a test.
6. The next times, you can only run ./test --no-setup . This will only reset 
the dataset as its initial state and re-run the test. You might have to run the 
tests some times before experiencing it... but I'm always able with only 2-3 
runs.


  was:
I'm experiencing an issue related to sstablesplit. I would like to understand 
if I am doing something wrong or there is an issue in the split process. The 
process fails randomly with the following exception:
{code}
ERROR 02:17:36 Error in ThreadPoolExecutor
java.lang.AssertionError: Data component is missing for 
sstable./tools/bin/../../data/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-16
{code}

See attached output.log file. The process never stops after this exception and 
I've also seen the dataset growing indefinitely (number of sstables).  

* I have not been to reproduce the issue with a single sstablesplit command. 
ie, specifying all files with glob matching.
* I can reproduce the if I call multiple sstablesplit on a single file (the way 
ccm does)

Here is the test case file to reproduce the bug:

https://drive.google.com/file/d/0BwZ_GPM33j6KdVh0NTdkOWV2R1E/view?usp=sharing

1. Download the split_issue.tar.gz file. It includes latest cassandra-2.1 
branch binaries.
2. Extract it
3. CD inside the use case directory
4. Download the dataset (2G) just to be sure we have the same thing, and place 
it in the working directory.
   https://docs.google.com/uc?id=0BwZ_GPM33j6KV3ViNnpPcVFndUUexport=download
5. The first time, run ./test.sh. This will setup and run a test.
6. The next times, you can only run ./test --no-setup . This will only reset 
the dataset as its initial state and re-run the test. You might have to run the 
tests some times before experiencing it... but I'm always able with only 2-3 
runs.



 sstablesplit fails *randomly* with Data component is missing
 

 Key: CASSANDRA-8623
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8623
 Project: Cassandra
  Issue Type: Bug
Reporter: Alan Boudreault
Assignee: Marcus Eriksson
 Attachments: output.log


 I'm experiencing an issue related to sstablesplit. I would like to understand 
 if I am doing something wrong or there is an issue in the split process. The 
 process fails randomly with the following exception:
 {code}
 ERROR 02:17:36 Error in ThreadPoolExecutor
 java.lang.AssertionError: Data component is missing for 
 sstable./tools/bin/../../data/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-16
 {code}
 See attached output.log file. The process never stops after this exception 
 and I've also seen the dataset growing indefinitely (number of sstables).  
 * I have not been able to reproduce the issue with a single sstablesplit 
 command. ie, specifying all files with glob matching.
 * I can reproduce the if I call multiple sstablesplit on a single file (the 
 way ccm does)
 Here is the test case file to reproduce the bug:
 https://drive.google.com/file/d/0BwZ_GPM33j6KdVh0NTdkOWV2R1E/view?usp=sharing
 1. Download the split_issue.tar.gz file. It includes latest cassandra-2.1 
 branch binaries.
 2. Extract it
 3. CD inside the use case directory
 4. Download the dataset (2G) just 

[jira] [Updated] (CASSANDRA-8623) sstablesplit fails *randomly* with Data component is missing

2015-01-14 Thread Alan Boudreault (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Boudreault updated CASSANDRA-8623:
---
Description: 
I'm experiencing an issue related to sstablesplit. I would like to understand 
if I am doing something wrong or there is an issue in the split process. The 
process fails randomly with the following exception:
{code}
ERROR 02:17:36 Error in ThreadPoolExecutor
java.lang.AssertionError: Data component is missing for 
sstable./tools/bin/../../data/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-16
{code}

See attached output.log file. The process never stops after this exception and 
I've also seen the dataset growing indefinitely (number of sstables).  

* I have not been able to reproduce the issue with a single sstablesplit 
command. ie, specifying all files with glob matching.
* I can reproduce the bug if I call multiple sstablesplit on a single file (the 
way ccm does)

Here is the test case file to reproduce the bug:

https://drive.google.com/file/d/0BwZ_GPM33j6KdVh0NTdkOWV2R1E/view?usp=sharing

1. Download the split_issue.tar.gz file. It includes latest cassandra-2.1 
branch binaries.
2. Extract it
3. CD inside the use case directory
4. Download the dataset (2G) just to be sure we have the same thing, and place 
it in the working directory.
   https://docs.google.com/uc?id=0BwZ_GPM33j6KV3ViNnpPcVFndUUexport=download
5. The first time, run ./test.sh. This will setup and run a test.
6. The next times, you can only run ./test --no-setup . This will only reset 
the dataset as its initial state and re-run the test. You might have to run the 
tests some times before experiencing it... but I'm always able with only 2-3 
runs.


  was:
I'm experiencing an issue related to sstablesplit. I would like to understand 
if I am doing something wrong or there is an issue in the split process. The 
process fails randomly with the following exception:
{code}
ERROR 02:17:36 Error in ThreadPoolExecutor
java.lang.AssertionError: Data component is missing for 
sstable./tools/bin/../../data/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-16
{code}

See attached output.log file. The process never stops after this exception and 
I've also seen the dataset growing indefinitely (number of sstables).  

* I have not been able to reproduce the issue with a single sstablesplit 
command. ie, specifying all files with glob matching.
* I can reproduce the if I call multiple sstablesplit on a single file (the way 
ccm does)

Here is the test case file to reproduce the bug:

https://drive.google.com/file/d/0BwZ_GPM33j6KdVh0NTdkOWV2R1E/view?usp=sharing

1. Download the split_issue.tar.gz file. It includes latest cassandra-2.1 
branch binaries.
2. Extract it
3. CD inside the use case directory
4. Download the dataset (2G) just to be sure we have the same thing, and place 
it in the working directory.
   https://docs.google.com/uc?id=0BwZ_GPM33j6KV3ViNnpPcVFndUUexport=download
5. The first time, run ./test.sh. This will setup and run a test.
6. The next times, you can only run ./test --no-setup . This will only reset 
the dataset as its initial state and re-run the test. You might have to run the 
tests some times before experiencing it... but I'm always able with only 2-3 
runs.



 sstablesplit fails *randomly* with Data component is missing
 

 Key: CASSANDRA-8623
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8623
 Project: Cassandra
  Issue Type: Bug
Reporter: Alan Boudreault
Assignee: Marcus Eriksson
 Attachments: output.log


 I'm experiencing an issue related to sstablesplit. I would like to understand 
 if I am doing something wrong or there is an issue in the split process. The 
 process fails randomly with the following exception:
 {code}
 ERROR 02:17:36 Error in ThreadPoolExecutor
 java.lang.AssertionError: Data component is missing for 
 sstable./tools/bin/../../data/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-16
 {code}
 See attached output.log file. The process never stops after this exception 
 and I've also seen the dataset growing indefinitely (number of sstables).  
 * I have not been able to reproduce the issue with a single sstablesplit 
 command. ie, specifying all files with glob matching.
 * I can reproduce the bug if I call multiple sstablesplit on a single file 
 (the way ccm does)
 Here is the test case file to reproduce the bug:
 https://drive.google.com/file/d/0BwZ_GPM33j6KdVh0NTdkOWV2R1E/view?usp=sharing
 1. Download the split_issue.tar.gz file. It includes latest cassandra-2.1 
 branch binaries.
 2. Extract it
 3. CD inside the use case directory
 4. Download the