Re: missing rows while importing data using sstable loader

2016-02-05 Thread Victor Chen
Arindam,

What can you share regarding the source from which you are importing data?
Is it a separate cassandra cluster? If so, how many nodes and datacenters?
What is RF (replication factor) of source cluster? How certain are you that
the rows indeed exist in the set of sstables which you are loading into
sstableloader? I ask b/c as a hypothetical, if you load sstables from a
single node from a 3 node single DC source cluster w/ RF=2, you won't be
importing a full set of the data that existed in the source cluster. In the
aforementioned case, you'd need to load sstables from at least two nodes to
have imported a full set of the data, because of the RF (if RF was 3, then
all you would need is a single node. If RF=1, then you'd need all sstables
from all three nodes).

On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
arindam.choudh...@ackstorm.com> wrote:

> Hi,
>
> I am importing data to a new cassandra cluster using sstableloader. The
> sstableloader runs without any warning or error. But I am missing around
> 1000 rows.
>
> Any feedback will be highly appreciated.
>
> Kind Regards,
> Arindam Choudhury
>


Re: missing rows while importing data using sstable loader

2016-02-05 Thread Jack Krupansky
I sent a message to DataStax Docs to add this nodetool flush suggestion to
the doc for sstableloader.

-- Jack Krupansky

On Fri, Feb 5, 2016 at 3:35 AM, Romain Hardouin  wrote:

> > What is the best practise to create sstables?
>
> When you run a "nodetool flush" Cassandra persists all the memtables on
> disk, i.e. it produces sstables.
> (You can create sstables by yourself thanks to  CQLSSTableWriter, but I
> don't think it was the point of your question.)
>


Re: missing rows while importing data using sstable loader

2016-02-01 Thread Arindam Choudhury
What is the best practise to create sstables?

On 1 February 2016 at 15:21, Romain Hardouin  wrote:

> Did you run "nodetool flush" on the source node? If not, the missing rows
> could be in memtables.
>


Re: missing rows while importing data using sstable loader

2016-02-01 Thread Romain Hardouin
Did you run "nodetool flush" on the source node? If not, the missing rows could 
be in memtables.


Re: missing rows while importing data using sstable loader

2016-01-29 Thread Arindam Choudhury
Hi Romain,

The RF was set to 2.

I changed it to one.

 CREATE KEYSPACE mordor WITH replication = {'class' : 'SimpleStrategy',
'replication_factor' : 1}  AND durable_writes = true;

re-inserted the columns, still missing rows.

Regards,
Arindam

On 29 January 2016 at 15:14, Romain Hardouin  wrote:

> Hi,
>
> I assume a RF > 1. Right?
> What is the consistency level you used? cqlsh use ONE by default.
> Try:
> cqlsh> CONSISTENCY ALL
> And run your query again.
>
> Best,
> Romain
>
>
> Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury <
> arindam.choudh...@ackstorm.com> a écrit :
>
>
> Hi Kai,
>
> The table schema is:
>
> CREATE TABLE mordor.things_values_meta (
> thing_id text,
> key text,
> bucket_timestamp timestamp,
> total_rows counter,
> PRIMARY KEY ((thing_id, key), bucket_timestamp)
> ) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'min_threshold': '4', 'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32'}
> AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99.0PERCENTILE';
>
>
> I am just running "select count(*) from things_values_meta ;" to get the
> count.
>
> Regards,
> Arindam
>
> On 29 January 2016 at 13:39, Kai Wang  wrote:
>
> Arindam,
>
> what's the table schema and what does your query to retrieve the rows look
> like?
>
> On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
> arindam.choudh...@ackstorm.com> wrote:
>
> Hi,
>
> I am importing data to a new cassandra cluster using sstableloader. The
> sstableloader runs without any warning or error. But I am missing around
> 1000 rows.
>
> Any feedback will be highly appreciated.
>
> Kind Regards,
> Arindam Choudhury
>
>
>
>
>
>


Re: missing rows while importing data using sstable loader

2016-01-29 Thread Romain Hardouin
Hi,
I assume a RF > 1. Right?What is the consistency level you used? cqlsh use ONE 
by default. Try: cqlsh> CONSISTENCY ALLAnd run your query again.
Best,Romain 

Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury 
 a écrit :
 

 Hi Kai,

The table schema is:

CREATE TABLE mordor.things_values_meta (
    thing_id text,
    key text,
    bucket_timestamp timestamp,
    total_rows counter,
    PRIMARY KEY ((thing_id, key), bucket_timestamp)
) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32'}
    AND compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';


I am just running "select count(*) from things_values_meta ;" to get the count.

Regards,
Arindam

On 29 January 2016 at 13:39, Kai Wang  wrote:

Arindam,

what's the table schema and what does your query to retrieve the rows look like?
 
On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury 
 wrote:

Hi,

I am importing data to a new cassandra cluster using sstableloader. The 
sstableloader runs without any warning or error. But I am missing around 1000 
rows.

Any feedback will be highly appreciated. 

Kind Regards,
Arindam Choudhury






  

Re: missing rows while importing data using sstable loader

2016-01-29 Thread Arindam Choudhury
I will check the output of nodetool cfstats.

Its from version 2.1.2 to version 2.1.9.

On 29 January 2016 at 16:02, Jack Krupansky 
wrote:

> Are these sstables from an existing Cassandra cluster or generated by a
> program?
>
> If the former, do a nodetool tablestats or cfstats to get the sstable
> count and compare it to both the number of sstables that the loader is
> reading from and the number that end up in the target cluster.
>
> What Cassandra version did the sstables come from and what version are you
> importing into?
>
>
> -- Jack Krupansky
>
> On Fri, Jan 29, 2016 at 9:34 AM, Arindam Choudhury <
> arindam.choudh...@ackstorm.com> wrote:
>
>> Hi Romain,
>>
>> The RF was set to 2.
>>
>> I changed it to one.
>>
>>  CREATE KEYSPACE mordor WITH replication = {'class' : 'SimpleStrategy',
>> 'replication_factor' : 1}  AND durable_writes = true;
>>
>> re-inserted the columns, still missing rows.
>>
>> Regards,
>> Arindam
>>
>> On 29 January 2016 at 15:14, Romain Hardouin  wrote:
>>
>>> Hi,
>>>
>>> I assume a RF > 1. Right?
>>> What is the consistency level you used? cqlsh use ONE by default.
>>> Try:
>>> cqlsh> CONSISTENCY ALL
>>> And run your query again.
>>>
>>> Best,
>>> Romain
>>>
>>>
>>> Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury <
>>> arindam.choudh...@ackstorm.com> a écrit :
>>>
>>>
>>> Hi Kai,
>>>
>>> The table schema is:
>>>
>>> CREATE TABLE mordor.things_values_meta (
>>> thing_id text,
>>> key text,
>>> bucket_timestamp timestamp,
>>> total_rows counter,
>>> PRIMARY KEY ((thing_id, key), bucket_timestamp)
>>> ) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
>>> AND bloom_filter_fp_chance = 0.01
>>> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>>> AND comment = ''
>>> AND compaction = {'min_threshold': '4', 'class':
>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>> 'max_threshold': '32'}
>>> AND compression = {'sstable_compression':
>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>> AND dclocal_read_repair_chance = 0.1
>>> AND default_time_to_live = 0
>>> AND gc_grace_seconds = 864000
>>> AND max_index_interval = 2048
>>> AND memtable_flush_period_in_ms = 0
>>> AND min_index_interval = 128
>>> AND read_repair_chance = 0.0
>>> AND speculative_retry = '99.0PERCENTILE';
>>>
>>>
>>> I am just running "select count(*) from things_values_meta ;" to get the
>>> count.
>>>
>>> Regards,
>>> Arindam
>>>
>>> On 29 January 2016 at 13:39, Kai Wang  wrote:
>>>
>>> Arindam,
>>>
>>> what's the table schema and what does your query to retrieve the rows
>>> look like?
>>>
>>> On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
>>> arindam.choudh...@ackstorm.com> wrote:
>>>
>>> Hi,
>>>
>>> I am importing data to a new cassandra cluster using sstableloader. The
>>> sstableloader runs without any warning or error. But I am missing around
>>> 1000 rows.
>>>
>>> Any feedback will be highly appreciated.
>>>
>>> Kind Regards,
>>> Arindam Choudhury
>>>
>>>
>>>
>>>
>>>
>>>
>>
>


Re: missing rows while importing data using sstable loader

2016-01-29 Thread Jack Krupansky
Are these sstables from an existing Cassandra cluster or generated by a
program?

If the former, do a nodetool tablestats or cfstats to get the sstable count
and compare it to both the number of sstables that the loader is reading
from and the number that end up in the target cluster.

What Cassandra version did the sstables come from and what version are you
importing into?


-- Jack Krupansky

On Fri, Jan 29, 2016 at 9:34 AM, Arindam Choudhury <
arindam.choudh...@ackstorm.com> wrote:

> Hi Romain,
>
> The RF was set to 2.
>
> I changed it to one.
>
>  CREATE KEYSPACE mordor WITH replication = {'class' : 'SimpleStrategy',
> 'replication_factor' : 1}  AND durable_writes = true;
>
> re-inserted the columns, still missing rows.
>
> Regards,
> Arindam
>
> On 29 January 2016 at 15:14, Romain Hardouin  wrote:
>
>> Hi,
>>
>> I assume a RF > 1. Right?
>> What is the consistency level you used? cqlsh use ONE by default.
>> Try:
>> cqlsh> CONSISTENCY ALL
>> And run your query again.
>>
>> Best,
>> Romain
>>
>>
>> Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury <
>> arindam.choudh...@ackstorm.com> a écrit :
>>
>>
>> Hi Kai,
>>
>> The table schema is:
>>
>> CREATE TABLE mordor.things_values_meta (
>> thing_id text,
>> key text,
>> bucket_timestamp timestamp,
>> total_rows counter,
>> PRIMARY KEY ((thing_id, key), bucket_timestamp)
>> ) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
>> AND bloom_filter_fp_chance = 0.01
>> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>> AND comment = ''
>> AND compaction = {'min_threshold': '4', 'class':
>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>> 'max_threshold': '32'}
>> AND compression = {'sstable_compression':
>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>> AND dclocal_read_repair_chance = 0.1
>> AND default_time_to_live = 0
>> AND gc_grace_seconds = 864000
>> AND max_index_interval = 2048
>> AND memtable_flush_period_in_ms = 0
>> AND min_index_interval = 128
>> AND read_repair_chance = 0.0
>> AND speculative_retry = '99.0PERCENTILE';
>>
>>
>> I am just running "select count(*) from things_values_meta ;" to get the
>> count.
>>
>> Regards,
>> Arindam
>>
>> On 29 January 2016 at 13:39, Kai Wang  wrote:
>>
>> Arindam,
>>
>> what's the table schema and what does your query to retrieve the rows
>> look like?
>>
>> On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
>> arindam.choudh...@ackstorm.com> wrote:
>>
>> Hi,
>>
>> I am importing data to a new cassandra cluster using sstableloader. The
>> sstableloader runs without any warning or error. But I am missing around
>> 1000 rows.
>>
>> Any feedback will be highly appreciated.
>>
>> Kind Regards,
>> Arindam Choudhury
>>
>>
>>
>>
>>
>>
>


Re: missing rows while importing data using sstable loader

2016-01-29 Thread Arindam Choudhury
I am counting the rows with "select count(*) from
mordor.things_values_meta;"

I am doing one node cluster to one node cluster for testing.

On 29 January 2016 at 16:20, Jack Krupansky 
wrote:

> And how are you counting the rows? With a query? If, so, what is the
> query. Using nodetool cfstats (estimated) key count? Or... what?
>
> Are the tokens for the missing rows is the same range and a distinct range
> from the rest of the data in the original cluster?
>
> How many nodes in the original cluster?
>
> -- Jack Krupansky
>
> On Fri, Jan 29, 2016 at 10:12 AM, Arindam Choudhury <
> arindam.choudh...@ackstorm.com> wrote:
>
>> I will check the output of nodetool cfstats.
>>
>> Its from version 2.1.2 to version 2.1.9.
>>
>> On 29 January 2016 at 16:02, Jack Krupansky 
>> wrote:
>>
>>> Are these sstables from an existing Cassandra cluster or generated by a
>>> program?
>>>
>>> If the former, do a nodetool tablestats or cfstats to get the sstable
>>> count and compare it to both the number of sstables that the loader is
>>> reading from and the number that end up in the target cluster.
>>>
>>> What Cassandra version did the sstables come from and what version are
>>> you importing into?
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> On Fri, Jan 29, 2016 at 9:34 AM, Arindam Choudhury <
>>> arindam.choudh...@ackstorm.com> wrote:
>>>
 Hi Romain,

 The RF was set to 2.

 I changed it to one.

  CREATE KEYSPACE mordor WITH replication = {'class' : 'SimpleStrategy',
 'replication_factor' : 1}  AND durable_writes = true;

 re-inserted the columns, still missing rows.

 Regards,
 Arindam

 On 29 January 2016 at 15:14, Romain Hardouin 
 wrote:

> Hi,
>
> I assume a RF > 1. Right?
> What is the consistency level you used? cqlsh use ONE by default.
> Try:
> cqlsh> CONSISTENCY ALL
> And run your query again.
>
> Best,
> Romain
>
>
> Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury <
> arindam.choudh...@ackstorm.com> a écrit :
>
>
> Hi Kai,
>
> The table schema is:
>
> CREATE TABLE mordor.things_values_meta (
> thing_id text,
> key text,
> bucket_timestamp timestamp,
> total_rows counter,
> PRIMARY KEY ((thing_id, key), bucket_timestamp)
> ) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'min_threshold': '4', 'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32'}
> AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99.0PERCENTILE';
>
>
> I am just running "select count(*) from things_values_meta ;" to get
> the count.
>
> Regards,
> Arindam
>
> On 29 January 2016 at 13:39, Kai Wang  wrote:
>
> Arindam,
>
> what's the table schema and what does your query to retrieve the rows
> look like?
>
> On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
> arindam.choudh...@ackstorm.com> wrote:
>
> Hi,
>
> I am importing data to a new cassandra cluster using sstableloader.
> The sstableloader runs without any warning or error. But I am missing
> around 1000 rows.
>
> Any feedback will be highly appreciated.
>
> Kind Regards,
> Arindam Choudhury
>
>
>
>
>
>

>>>
>>
>


Re: missing rows while importing data using sstable loader

2016-01-29 Thread Kai Wang
Arindam,

what's the table schema and what does your query to retrieve the rows look
like?

On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
arindam.choudh...@ackstorm.com> wrote:

> Hi,
>
> I am importing data to a new cassandra cluster using sstableloader. The
> sstableloader runs without any warning or error. But I am missing around
> 1000 rows.
>
> Any feedback will be highly appreciated.
>
> Kind Regards,
> Arindam Choudhury
>


Re: missing rows while importing data using sstable loader

2016-01-29 Thread Arindam Choudhury
Hi Kai,

The table schema is:

CREATE TABLE mordor.things_values_meta (
thing_id text,
key text,
bucket_timestamp timestamp,
total_rows counter,
PRIMARY KEY ((thing_id, key), bucket_timestamp)
) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32'}
AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';


I am just running "select count(*) from things_values_meta ;" to get the
count.

Regards,
Arindam

On 29 January 2016 at 13:39, Kai Wang  wrote:

> Arindam,
>
> what's the table schema and what does your query to retrieve the rows look
> like?
>
> On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
> arindam.choudh...@ackstorm.com> wrote:
>
>> Hi,
>>
>> I am importing data to a new cassandra cluster using sstableloader. The
>> sstableloader runs without any warning or error. But I am missing around
>> 1000 rows.
>>
>> Any feedback will be highly appreciated.
>>
>> Kind Regards,
>> Arindam Choudhury
>>
>
>


Re: missing rows while importing data using sstable loader

2016-01-29 Thread Arindam Choudhury
Why in cqlsh when I query "select count(*) from mordor.things_values_meta
;" it says: 4692

But in nodetool cfstats it says Number of keys (estimate): 4720?

On 29 January 2016 at 16:25, Arindam Choudhury <
arindam.choudh...@ackstorm.com> wrote:

> I am counting the rows with "select count(*) from
> mordor.things_values_meta;"
>
> I am doing one node cluster to one node cluster for testing.
>
> On 29 January 2016 at 16:20, Jack Krupansky 
> wrote:
>
>> And how are you counting the rows? With a query? If, so, what is the
>> query. Using nodetool cfstats (estimated) key count? Or... what?
>>
>> Are the tokens for the missing rows is the same range and a distinct
>> range from the rest of the data in the original cluster?
>>
>> How many nodes in the original cluster?
>>
>> -- Jack Krupansky
>>
>> On Fri, Jan 29, 2016 at 10:12 AM, Arindam Choudhury <
>> arindam.choudh...@ackstorm.com> wrote:
>>
>>> I will check the output of nodetool cfstats.
>>>
>>> Its from version 2.1.2 to version 2.1.9.
>>>
>>> On 29 January 2016 at 16:02, Jack Krupansky 
>>> wrote:
>>>
 Are these sstables from an existing Cassandra cluster or generated by a
 program?

 If the former, do a nodetool tablestats or cfstats to get the sstable
 count and compare it to both the number of sstables that the loader is
 reading from and the number that end up in the target cluster.

 What Cassandra version did the sstables come from and what version are
 you importing into?


 -- Jack Krupansky

 On Fri, Jan 29, 2016 at 9:34 AM, Arindam Choudhury <
 arindam.choudh...@ackstorm.com> wrote:

> Hi Romain,
>
> The RF was set to 2.
>
> I changed it to one.
>
>  CREATE KEYSPACE mordor WITH replication = {'class' :
> 'SimpleStrategy', 'replication_factor' : 1}  AND durable_writes = true;
>
> re-inserted the columns, still missing rows.
>
> Regards,
> Arindam
>
> On 29 January 2016 at 15:14, Romain Hardouin 
> wrote:
>
>> Hi,
>>
>> I assume a RF > 1. Right?
>> What is the consistency level you used? cqlsh use ONE by default.
>> Try:
>> cqlsh> CONSISTENCY ALL
>> And run your query again.
>>
>> Best,
>> Romain
>>
>>
>> Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury <
>> arindam.choudh...@ackstorm.com> a écrit :
>>
>>
>> Hi Kai,
>>
>> The table schema is:
>>
>> CREATE TABLE mordor.things_values_meta (
>> thing_id text,
>> key text,
>> bucket_timestamp timestamp,
>> total_rows counter,
>> PRIMARY KEY ((thing_id, key), bucket_timestamp)
>> ) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
>> AND bloom_filter_fp_chance = 0.01
>> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>> AND comment = ''
>> AND compaction = {'min_threshold': '4', 'class':
>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>> 'max_threshold': '32'}
>> AND compression = {'sstable_compression':
>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>> AND dclocal_read_repair_chance = 0.1
>> AND default_time_to_live = 0
>> AND gc_grace_seconds = 864000
>> AND max_index_interval = 2048
>> AND memtable_flush_period_in_ms = 0
>> AND min_index_interval = 128
>> AND read_repair_chance = 0.0
>> AND speculative_retry = '99.0PERCENTILE';
>>
>>
>> I am just running "select count(*) from things_values_meta ;" to get
>> the count.
>>
>> Regards,
>> Arindam
>>
>> On 29 January 2016 at 13:39, Kai Wang  wrote:
>>
>> Arindam,
>>
>> what's the table schema and what does your query to retrieve the rows
>> look like?
>>
>> On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
>> arindam.choudh...@ackstorm.com> wrote:
>>
>> Hi,
>>
>> I am importing data to a new cassandra cluster using sstableloader.
>> The sstableloader runs without any warning or error. But I am missing
>> around 1000 rows.
>>
>> Any feedback will be highly appreciated.
>>
>> Kind Regards,
>> Arindam Choudhury
>>
>>
>>
>>
>>
>>
>

>>>
>>
>


Re: missing rows while importing data using sstable loader

2016-01-29 Thread Jack Krupansky
I agree that there should be more clear doc on exactly how the estimation
is calculated. When I inquired about this recently the response was that it
should be within about 2% of the actual key count. I started looking at the
code, but I ran out of time before I chased down all the subsidiary factors
in the calculation.

It would be nice to have an explicit nodetool option to count actual keys.
Presumably that would be more efficient than a select count(*).


-- Jack Krupansky

On Fri, Jan 29, 2016 at 11:27 AM, Arindam Choudhury <
arindam.choudh...@ackstorm.com> wrote:

> Why in cqlsh when I query "select count(*) from mordor.things_values_meta
> ;" it says: 4692
>
> But in nodetool cfstats it says Number of keys (estimate): 4720?
>
> On 29 January 2016 at 16:25, Arindam Choudhury <
> arindam.choudh...@ackstorm.com> wrote:
>
>> I am counting the rows with "select count(*) from
>> mordor.things_values_meta;"
>>
>> I am doing one node cluster to one node cluster for testing.
>>
>> On 29 January 2016 at 16:20, Jack Krupansky 
>> wrote:
>>
>>> And how are you counting the rows? With a query? If, so, what is the
>>> query. Using nodetool cfstats (estimated) key count? Or... what?
>>>
>>> Are the tokens for the missing rows is the same range and a distinct
>>> range from the rest of the data in the original cluster?
>>>
>>> How many nodes in the original cluster?
>>>
>>> -- Jack Krupansky
>>>
>>> On Fri, Jan 29, 2016 at 10:12 AM, Arindam Choudhury <
>>> arindam.choudh...@ackstorm.com> wrote:
>>>
 I will check the output of nodetool cfstats.

 Its from version 2.1.2 to version 2.1.9.

 On 29 January 2016 at 16:02, Jack Krupansky 
 wrote:

> Are these sstables from an existing Cassandra cluster or generated by
> a program?
>
> If the former, do a nodetool tablestats or cfstats to get the sstable
> count and compare it to both the number of sstables that the loader is
> reading from and the number that end up in the target cluster.
>
> What Cassandra version did the sstables come from and what version are
> you importing into?
>
>
> -- Jack Krupansky
>
> On Fri, Jan 29, 2016 at 9:34 AM, Arindam Choudhury <
> arindam.choudh...@ackstorm.com> wrote:
>
>> Hi Romain,
>>
>> The RF was set to 2.
>>
>> I changed it to one.
>>
>>  CREATE KEYSPACE mordor WITH replication = {'class' :
>> 'SimpleStrategy', 'replication_factor' : 1}  AND durable_writes = true;
>>
>> re-inserted the columns, still missing rows.
>>
>> Regards,
>> Arindam
>>
>> On 29 January 2016 at 15:14, Romain Hardouin 
>> wrote:
>>
>>> Hi,
>>>
>>> I assume a RF > 1. Right?
>>> What is the consistency level you used? cqlsh use ONE by default.
>>> Try:
>>> cqlsh> CONSISTENCY ALL
>>> And run your query again.
>>>
>>> Best,
>>> Romain
>>>
>>>
>>> Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury <
>>> arindam.choudh...@ackstorm.com> a écrit :
>>>
>>>
>>> Hi Kai,
>>>
>>> The table schema is:
>>>
>>> CREATE TABLE mordor.things_values_meta (
>>> thing_id text,
>>> key text,
>>> bucket_timestamp timestamp,
>>> total_rows counter,
>>> PRIMARY KEY ((thing_id, key), bucket_timestamp)
>>> ) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
>>> AND bloom_filter_fp_chance = 0.01
>>> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>>> AND comment = ''
>>> AND compaction = {'min_threshold': '4', 'class':
>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>> 'max_threshold': '32'}
>>> AND compression = {'sstable_compression':
>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>> AND dclocal_read_repair_chance = 0.1
>>> AND default_time_to_live = 0
>>> AND gc_grace_seconds = 864000
>>> AND max_index_interval = 2048
>>> AND memtable_flush_period_in_ms = 0
>>> AND min_index_interval = 128
>>> AND read_repair_chance = 0.0
>>> AND speculative_retry = '99.0PERCENTILE';
>>>
>>>
>>> I am just running "select count(*) from things_values_meta ;" to get
>>> the count.
>>>
>>> Regards,
>>> Arindam
>>>
>>> On 29 January 2016 at 13:39, Kai Wang  wrote:
>>>
>>> Arindam,
>>>
>>> what's the table schema and what does your query to retrieve the
>>> rows look like?
>>>
>>> On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
>>> arindam.choudh...@ackstorm.com> wrote:
>>>
>>> Hi,
>>>
>>> I am importing data to a new cassandra cluster using sstableloader.
>>> The sstableloader runs without any warning or error. But I am missing
>>> around 1000 rows.
>>>
>>> Any feedback will 

Re: missing rows while importing data using sstable loader

2016-01-29 Thread Jack Krupansky
And how are you counting the rows? With a query? If, so, what is the query.
Using nodetool cfstats (estimated) key count? Or... what?

Are the tokens for the missing rows is the same range and a distinct range
from the rest of the data in the original cluster?

How many nodes in the original cluster?

-- Jack Krupansky

On Fri, Jan 29, 2016 at 10:12 AM, Arindam Choudhury <
arindam.choudh...@ackstorm.com> wrote:

> I will check the output of nodetool cfstats.
>
> Its from version 2.1.2 to version 2.1.9.
>
> On 29 January 2016 at 16:02, Jack Krupansky 
> wrote:
>
>> Are these sstables from an existing Cassandra cluster or generated by a
>> program?
>>
>> If the former, do a nodetool tablestats or cfstats to get the sstable
>> count and compare it to both the number of sstables that the loader is
>> reading from and the number that end up in the target cluster.
>>
>> What Cassandra version did the sstables come from and what version are
>> you importing into?
>>
>>
>> -- Jack Krupansky
>>
>> On Fri, Jan 29, 2016 at 9:34 AM, Arindam Choudhury <
>> arindam.choudh...@ackstorm.com> wrote:
>>
>>> Hi Romain,
>>>
>>> The RF was set to 2.
>>>
>>> I changed it to one.
>>>
>>>  CREATE KEYSPACE mordor WITH replication = {'class' : 'SimpleStrategy',
>>> 'replication_factor' : 1}  AND durable_writes = true;
>>>
>>> re-inserted the columns, still missing rows.
>>>
>>> Regards,
>>> Arindam
>>>
>>> On 29 January 2016 at 15:14, Romain Hardouin 
>>> wrote:
>>>
 Hi,

 I assume a RF > 1. Right?
 What is the consistency level you used? cqlsh use ONE by default.
 Try:
 cqlsh> CONSISTENCY ALL
 And run your query again.

 Best,
 Romain


 Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury <
 arindam.choudh...@ackstorm.com> a écrit :


 Hi Kai,

 The table schema is:

 CREATE TABLE mordor.things_values_meta (
 thing_id text,
 key text,
 bucket_timestamp timestamp,
 total_rows counter,
 PRIMARY KEY ((thing_id, key), bucket_timestamp)
 ) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
 AND bloom_filter_fp_chance = 0.01
 AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
 AND comment = ''
 AND compaction = {'min_threshold': '4', 'class':
 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
 'max_threshold': '32'}
 AND compression = {'sstable_compression':
 'org.apache.cassandra.io.compress.LZ4Compressor'}
 AND dclocal_read_repair_chance = 0.1
 AND default_time_to_live = 0
 AND gc_grace_seconds = 864000
 AND max_index_interval = 2048
 AND memtable_flush_period_in_ms = 0
 AND min_index_interval = 128
 AND read_repair_chance = 0.0
 AND speculative_retry = '99.0PERCENTILE';


 I am just running "select count(*) from things_values_meta ;" to get
 the count.

 Regards,
 Arindam

 On 29 January 2016 at 13:39, Kai Wang  wrote:

 Arindam,

 what's the table schema and what does your query to retrieve the rows
 look like?

 On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
 arindam.choudh...@ackstorm.com> wrote:

 Hi,

 I am importing data to a new cassandra cluster using sstableloader. The
 sstableloader runs without any warning or error. But I am missing around
 1000 rows.

 Any feedback will be highly appreciated.

 Kind Regards,
 Arindam Choudhury






>>>
>>
>