Re: Is replication possible with already existing data?

2015-10-23 Thread Ajay Garg
Any ideas, please?
To repeat, we are using the exact same cassandra-version on all 4 nodes
(2.1.10).

On Fri, Oct 23, 2015 at 9:43 AM, Ajay Garg  wrote:

> Hi Michael.
>
> Please find below the contents of cassandra.yaml for CAS11 (the files on
> the rest of the three nodes are also exactly the same, except the
> "initial_token" and "listen_address" fields) ::
>
> CAS11 ::
>
> 
> cluster_name: 'InstaMsg Cluster'
> num_tokens: 256
> initial_token: -9223372036854775808
> hinted_handoff_enabled: true
> max_hint_window_in_ms: 1080 # 3 hours
> hinted_handoff_throttle_in_kb: 1024
> max_hints_delivery_threads: 2
> batchlog_replay_throttle_in_kb: 1024
> authenticator: AllowAllAuthenticator
> authorizer: AllowAllAuthorizer
> permissions_validity_in_ms: 2000
> partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> data_file_directories:
> - /var/lib/cassandra/data
>
> commitlog_directory: /var/lib/cassandra/commitlog
>
> disk_failure_policy: stop
> commit_failure_policy: stop
> key_cache_size_in_mb:
> key_cache_save_period: 14400
> row_cache_size_in_mb: 0
> row_cache_save_period: 0
> counter_cache_size_in_mb:
> counter_cache_save_period: 7200
> saved_caches_directory: /var/lib/cassandra/saved_caches
> commitlog_sync: periodic
> commitlog_sync_period_in_ms: 1
> commitlog_segment_size_in_mb: 32
> seed_provider:
> - class_name: org.apache.cassandra.locator.SimpleSeedProvider
>   parameters:
>   - seeds: "104.239.200.33,119.9.92.77"
>
> concurrent_reads: 32
> concurrent_writes: 32
> concurrent_counter_writes: 32
>
> memtable_allocation_type: heap_buffers
>
> index_summary_capacity_in_mb:
> index_summary_resize_interval_in_minutes: 60
> trickle_fsync: false
> trickle_fsync_interval_in_kb: 10240
> storage_port: 7000
> ssl_storage_port: 7001
> listen_address: 104.239.200.33
> start_native_transport: true
> native_transport_port: 9042
> start_rpc: true
> rpc_address: localhost
> rpc_port: 9160
> rpc_keepalive: true
>
> rpc_server_type: sync
> thrift_framed_transport_size_in_mb: 15
> incremental_backups: false
> snapshot_before_compaction: false
> auto_snapshot: true
>
> tombstone_warn_threshold: 1000
> tombstone_failure_threshold: 10
>
> column_index_size_in_kb: 64
> batch_size_warn_threshold_in_kb: 5
>
> compaction_throughput_mb_per_sec: 16
> compaction_large_partition_warning_threshold_mb: 100
>
> sstable_preemptive_open_interval_in_mb: 50
>
> read_request_timeout_in_ms: 5000
> range_request_timeout_in_ms: 1
>
> write_request_timeout_in_ms: 2000
> counter_write_request_timeout_in_ms: 5000
> cas_contention_timeout_in_ms: 1000
> truncate_request_timeout_in_ms: 6
> request_timeout_in_ms: 1
> cross_node_timeout: false
> endpoint_snitch: PropertyFileSnitch
>
> dynamic_snitch_update_interval_in_ms: 100
> dynamic_snitch_reset_interval_in_ms: 60
> dynamic_snitch_badness_threshold: 0.1
>
> request_scheduler: org.apache.cassandra.scheduler.NoScheduler
>
> server_encryption_options:
> internode_encryption: none
> keystore: conf/.keystore
> keystore_password: cassandra
> truststore: conf/.truststore
> truststore_password: cassandra
>
> client_encryption_options:
> enabled: false
> keystore: conf/.keystore
> keystore_password: cassandra
>
> internode_compression: all
> inter_dc_tcp_nodelay: false
> 
>
>
> What changes need to be made, so that whenever a downed server comes back
> up, the missing data comes back over to it?
>
> Thanks and Regards,
> Ajay
>
>
>
> On Fri, Oct 23, 2015 at 9:05 AM, Michael Shuler 
> wrote:
>
>> On 10/22/2015 10:14 PM, Ajay Garg wrote:
>>
>>> However, CAS11 refuses to come up now.
>>> Following is the error in /var/log/cassandra/system.log ::
>>>
>>>
>>> 
>>> ERROR [main] 2015-10-23 03:07:34,242 CassandraDaemon.java:391 - Fatal
>>> configuration error
>>> org.apache.cassandra.exceptions.ConfigurationException: Cannot change
>>> the number of tokens from 1 to 256
>>>
>>
>> Check your cassandra.yaml - this node has vnodes enabled in the
>> configuration when it did not, previously. Check all nodes. Something
>> changed. Mixed vnode/non-vnode clusters is bad juju.
>>
>> --
>> Kind regards,
>> Michael
>>
>
>
>
> --
> Regards,
> Ajay
>



-- 
Regards,
Ajay


timestamp as clustering key doesn't work as expected

2015-10-23 Thread Kai Wang
Hi,

I use a timestamp column as the last clustering key so that I can run query
like "timestamp > ... AND timestamp < ...". But it doesn't work as
expected. Here is a simplified example.

My table:
CREATE TABLE test (
tag text,
group int,
timestamp timestamp,
value double,
PRIMARY KEY (tag, group, timestamp)
) WITH CLUSTERING ORDER BY (group ASC, timestamp DESC)

After inserting some data, here is my query:

cqlsh> select * from test where tag = 'MSFT' and group = 1 and timestamp
='2004-12-15 16:00:00-0500';

 tag  | group | timestamp| value
--+---+--+---
 MSFT | 1 | 2004-12-15 21:00:00+ | 27.11
 MSFT | 1 | 2004-12-16 21:00:00+ | 27.16
 MSFT | 1 | 2004-12-17 21:00:00+ | 26.96
 MSFT | 1 | 2004-12-20 21:00:00+ | 26.95
 MSFT | 1 | 2004-12-21 21:00:00+ | 27.07
 MSFT | 1 | 2004-12-22 21:00:00+ | 26.98
 MSFT | 1 | 2004-12-23 21:00:00+ | 27.01
 MSFT | 1 | 2004-12-27 21:00:00+ | 26.85
 MSFT | 1 | 2004-12-28 21:00:00+ | 26.95
 MSFT | 1 | 2004-12-29 21:00:00+ |  26.9
 MSFT | 1 | 2004-12-30 21:00:00+ | 26.76
(11 rows)

This doesn't make sense. I expect this query to return only the first row.
Why does it give me back rows with different timestamps? Did I
misunderstand how timestamp and clustering key work?

Thanks.

-Kai


Re: timestamp as clustering key doesn't work as expected

2015-10-23 Thread Kai Wang
Jon,

It's 2.1.10. I will see if I can reproduce it with a simple script.

Thanks.

On Fri, Oct 23, 2015 at 1:05 PM, Jon Haddad  wrote:

> What version of Cassandra?  I can’t think of a reason why you’d see this
> output.  If you can reliably reproduce, this should be filed as a JIRA.
> https://issues.apache.org/jira
>
>
>
> > On Oct 23, 2015, at 8:55 AM, Kai Wang  wrote:
> >
> > Hi,
> >
> > I use a timestamp column as the last clustering key so that I can run
> query like "timestamp > ... AND timestamp < ...". But it doesn't work as
> expected. Here is a simplified example.
> >
> > My table:
> > CREATE TABLE test (
> > tag text,
> > group int,
> > timestamp timestamp,
> > value double,
> > PRIMARY KEY (tag, group, timestamp)
> > ) WITH CLUSTERING ORDER BY (group ASC, timestamp DESC)
> >
> > After inserting some data, here is my query:
> >
> > cqlsh> select * from test where tag = 'MSFT' and group = 1 and timestamp
> ='2004-12-15 16:00:00-0500';
> >
> >  tag  | group | timestamp| value
> > --+---+--+---
> >  MSFT | 1 | 2004-12-15 21:00:00+ | 27.11
> >  MSFT | 1 | 2004-12-16 21:00:00+ | 27.16
> >  MSFT | 1 | 2004-12-17 21:00:00+ | 26.96
> >  MSFT | 1 | 2004-12-20 21:00:00+ | 26.95
> >  MSFT | 1 | 2004-12-21 21:00:00+ | 27.07
> >  MSFT | 1 | 2004-12-22 21:00:00+ | 26.98
> >  MSFT | 1 | 2004-12-23 21:00:00+ | 27.01
> >  MSFT | 1 | 2004-12-27 21:00:00+ | 26.85
> >  MSFT | 1 | 2004-12-28 21:00:00+ | 26.95
> >  MSFT | 1 | 2004-12-29 21:00:00+ |  26.9
> >  MSFT | 1 | 2004-12-30 21:00:00+ | 26.76
> > (11 rows)
> >
> > This doesn't make sense. I expect this query to return only the first
> row. Why does it give me back rows with different timestamps? Did I
> misunderstand how timestamp and clustering key work?
> >
> > Thanks.
> >
> > -Kai
>
>


Re: timestamp as clustering key doesn't work as expected

2015-10-23 Thread Kai Wang
https://issues.apache.org/jira/browse/CASSANDRA-10583

On Fri, Oct 23, 2015 at 1:26 PM, Kai Wang  wrote:

> Jon,
>
> It's 2.1.10. I will see if I can reproduce it with a simple script.
>
> Thanks.
>
> On Fri, Oct 23, 2015 at 1:05 PM, Jon Haddad  wrote:
>
>> What version of Cassandra?  I can’t think of a reason why you’d see this
>> output.  If you can reliably reproduce, this should be filed as a JIRA.
>> https://issues.apache.org/jira
>>
>>
>>
>> > On Oct 23, 2015, at 8:55 AM, Kai Wang  wrote:
>> >
>> > Hi,
>> >
>> > I use a timestamp column as the last clustering key so that I can run
>> query like "timestamp > ... AND timestamp < ...". But it doesn't work as
>> expected. Here is a simplified example.
>> >
>> > My table:
>> > CREATE TABLE test (
>> > tag text,
>> > group int,
>> > timestamp timestamp,
>> > value double,
>> > PRIMARY KEY (tag, group, timestamp)
>> > ) WITH CLUSTERING ORDER BY (group ASC, timestamp DESC)
>> >
>> > After inserting some data, here is my query:
>> >
>> > cqlsh> select * from test where tag = 'MSFT' and group = 1 and
>> timestamp ='2004-12-15 16:00:00-0500';
>> >
>> >  tag  | group | timestamp| value
>> > --+---+--+---
>> >  MSFT | 1 | 2004-12-15 21:00:00+ | 27.11
>> >  MSFT | 1 | 2004-12-16 21:00:00+ | 27.16
>> >  MSFT | 1 | 2004-12-17 21:00:00+ | 26.96
>> >  MSFT | 1 | 2004-12-20 21:00:00+ | 26.95
>> >  MSFT | 1 | 2004-12-21 21:00:00+ | 27.07
>> >  MSFT | 1 | 2004-12-22 21:00:00+ | 26.98
>> >  MSFT | 1 | 2004-12-23 21:00:00+ | 27.01
>> >  MSFT | 1 | 2004-12-27 21:00:00+ | 26.85
>> >  MSFT | 1 | 2004-12-28 21:00:00+ | 26.95
>> >  MSFT | 1 | 2004-12-29 21:00:00+ |  26.9
>> >  MSFT | 1 | 2004-12-30 21:00:00+ | 26.76
>> > (11 rows)
>> >
>> > This doesn't make sense. I expect this query to return only the first
>> row. Why does it give me back rows with different timestamps? Did I
>> misunderstand how timestamp and clustering key work?
>> >
>> > Thanks.
>> >
>> > -Kai
>>
>>
>


Re: timestamp as clustering key doesn't work as expected

2015-10-23 Thread Jon Haddad
What version of Cassandra?  I can’t think of a reason why you’d see this 
output.  If you can reliably reproduce, this should be filed as a JIRA. 
https://issues.apache.org/jira



> On Oct 23, 2015, at 8:55 AM, Kai Wang  wrote:
> 
> Hi,
> 
> I use a timestamp column as the last clustering key so that I can run query 
> like "timestamp > ... AND timestamp < ...". But it doesn't work as expected. 
> Here is a simplified example.
> 
> My table:
> CREATE TABLE test (
> tag text,
> group int,
> timestamp timestamp,
> value double,
> PRIMARY KEY (tag, group, timestamp)
> ) WITH CLUSTERING ORDER BY (group ASC, timestamp DESC)
> 
> After inserting some data, here is my query:
> 
> cqlsh> select * from test where tag = 'MSFT' and group = 1 and timestamp 
> ='2004-12-15 16:00:00-0500';
> 
>  tag  | group | timestamp| value
> --+---+--+---
>  MSFT | 1 | 2004-12-15 21:00:00+ | 27.11
>  MSFT | 1 | 2004-12-16 21:00:00+ | 27.16
>  MSFT | 1 | 2004-12-17 21:00:00+ | 26.96
>  MSFT | 1 | 2004-12-20 21:00:00+ | 26.95
>  MSFT | 1 | 2004-12-21 21:00:00+ | 27.07
>  MSFT | 1 | 2004-12-22 21:00:00+ | 26.98
>  MSFT | 1 | 2004-12-23 21:00:00+ | 27.01
>  MSFT | 1 | 2004-12-27 21:00:00+ | 26.85
>  MSFT | 1 | 2004-12-28 21:00:00+ | 26.95
>  MSFT | 1 | 2004-12-29 21:00:00+ |  26.9
>  MSFT | 1 | 2004-12-30 21:00:00+ | 26.76
> (11 rows)
> 
> This doesn't make sense. I expect this query to return only the first row. 
> Why does it give me back rows with different timestamps? Did I misunderstand 
> how timestamp and clustering key work?
> 
> Thanks.
> 
> -Kai



Re: Is replication possible with already existing data?

2015-10-23 Thread Ajay Garg
Thanks Steve and Michael.

Simply uncommenting "initial_token" did the trick !!!

Right now, I was evaluating replication, for the case when everything is a
clean install.
Will now try my hands on integrating/starting replication, with
pre-existing data.


Once again, thanks a ton for all the help guys !!!


Thanks and Regards,
Ajay

On Sat, Oct 24, 2015 at 2:06 AM, Steve Robenalt 
wrote:

> Hi Ajay,
>
> Please take a look at the cassandra.yaml configuration reference regarding
> intial_token and num_tokens:
>
>
> http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__initial_token
>
> This is basically what Michael was referring to in his earlier message.
> Setting an initial token overrode your num_tokens setting on initial
> startup, but after initial startup, the initial token setting is ignored,
> so num_tokens comes into play, attempting to start up with 256 vnodes.
> That's where your error comes from.
>
> It's likely that all of your nodes started up like this since you have the
> same config on all of them (hopefully, you at least changed initial_token
> for each node).
>
> After reviewing the doc on the two sections above, you'll need to decide
> which path to take to recover. You can likely bring the downed node up by
> setting num_tokens to 1 (which you'd need to do on all nodes), in which
> case you're not really running vnodes. Alternately, you can migrate the
> cluster to vnodes:
>
>
> http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configVnodesProduction_t.html
>
> BTW, I recommend carefully reviewing the cassandra.yaml configuration
> reference for ANY change you make from the default. As you've experienced
> here, not all settings are intended to work together.
>
> HTH,
> Steve
>
>
>
> On Fri, Oct 23, 2015 at 12:07 PM, Ajay Garg 
> wrote:
>
>> Any ideas, please?
>> To repeat, we are using the exact same cassandra-version on all 4 nodes
>> (2.1.10).
>>
>> On Fri, Oct 23, 2015 at 9:43 AM, Ajay Garg 
>> wrote:
>>
>>> Hi Michael.
>>>
>>> Please find below the contents of cassandra.yaml for CAS11 (the files on
>>> the rest of the three nodes are also exactly the same, except the
>>> "initial_token" and "listen_address" fields) ::
>>>
>>> CAS11 ::
>>>
>>>
>>>
>>> What changes need to be made, so that whenever a downed server comes
>>> back up, the missing data comes back over to it?
>>>
>>> Thanks and Regards,
>>> Ajay
>>>
>>>
>>>
>>> On Fri, Oct 23, 2015 at 9:05 AM, Michael Shuler 
>>> wrote:
>>>
 On 10/22/2015 10:14 PM, Ajay Garg wrote:

> However, CAS11 refuses to come up now.
> Following is the error in /var/log/cassandra/system.log ::
>
>
> 
> ERROR [main] 2015-10-23 03:07:34,242 CassandraDaemon.java:391 - Fatal
> configuration error
> org.apache.cassandra.exceptions.ConfigurationException: Cannot change
> the number of tokens from 1 to 256
>

 Check your cassandra.yaml - this node has vnodes enabled in the
 configuration when it did not, previously. Check all nodes. Something
 changed. Mixed vnode/non-vnode clusters is bad juju.

 --
 Kind regards,
 Michael

>>>
>>>
>>>
>>> --
>>> Regards,
>>> Ajay
>>>
>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>
>
>
> --
> Steve Robenalt
> Software Architect
> sroben...@highwire.org 
> (office/cell): 916-505-1785
>
> HighWire Press, Inc.
> 425 Broadway St, Redwood City, CA 94063
> www.highwire.org
>
> Technology for Scholarly Communication
>



-- 
Regards,
Ajay


Re: Is replication possible with already existing data?

2015-10-23 Thread Steve Robenalt
Hi Ajay,

Please take a look at the cassandra.yaml configuration reference regarding
intial_token and num_tokens:

http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__initial_token

This is basically what Michael was referring to in his earlier message.
Setting an initial token overrode your num_tokens setting on initial
startup, but after initial startup, the initial token setting is ignored,
so num_tokens comes into play, attempting to start up with 256 vnodes.
That's where your error comes from.

It's likely that all of your nodes started up like this since you have the
same config on all of them (hopefully, you at least changed initial_token
for each node).

After reviewing the doc on the two sections above, you'll need to decide
which path to take to recover. You can likely bring the downed node up by
setting num_tokens to 1 (which you'd need to do on all nodes), in which
case you're not really running vnodes. Alternately, you can migrate the
cluster to vnodes:

http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configVnodesProduction_t.html

BTW, I recommend carefully reviewing the cassandra.yaml configuration
reference for ANY change you make from the default. As you've experienced
here, not all settings are intended to work together.

HTH,
Steve



On Fri, Oct 23, 2015 at 12:07 PM, Ajay Garg  wrote:

> Any ideas, please?
> To repeat, we are using the exact same cassandra-version on all 4 nodes
> (2.1.10).
>
> On Fri, Oct 23, 2015 at 9:43 AM, Ajay Garg  wrote:
>
>> Hi Michael.
>>
>> Please find below the contents of cassandra.yaml for CAS11 (the files on
>> the rest of the three nodes are also exactly the same, except the
>> "initial_token" and "listen_address" fields) ::
>>
>> CAS11 ::
>>
>> 
>> cluster_name: 'InstaMsg Cluster'
>> num_tokens: 256
>> initial_token: -9223372036854775808
>> hinted_handoff_enabled: true
>> max_hint_window_in_ms: 1080 # 3 hours
>> hinted_handoff_throttle_in_kb: 1024
>> max_hints_delivery_threads: 2
>> batchlog_replay_throttle_in_kb: 1024
>> authenticator: AllowAllAuthenticator
>> authorizer: AllowAllAuthorizer
>> permissions_validity_in_ms: 2000
>> partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>> data_file_directories:
>> - /var/lib/cassandra/data
>>
>> commitlog_directory: /var/lib/cassandra/commitlog
>>
>> disk_failure_policy: stop
>> commit_failure_policy: stop
>> key_cache_size_in_mb:
>> key_cache_save_period: 14400
>> row_cache_size_in_mb: 0
>> row_cache_save_period: 0
>> counter_cache_size_in_mb:
>> counter_cache_save_period: 7200
>> saved_caches_directory: /var/lib/cassandra/saved_caches
>> commitlog_sync: periodic
>> commitlog_sync_period_in_ms: 1
>> commitlog_segment_size_in_mb: 32
>> seed_provider:
>> - class_name: org.apache.cassandra.locator.SimpleSeedProvider
>>   parameters:
>>   - seeds: "104.239.200.33,119.9.92.77"
>>
>> concurrent_reads: 32
>> concurrent_writes: 32
>> concurrent_counter_writes: 32
>>
>> memtable_allocation_type: heap_buffers
>>
>> index_summary_capacity_in_mb:
>> index_summary_resize_interval_in_minutes: 60
>> trickle_fsync: false
>> trickle_fsync_interval_in_kb: 10240
>> storage_port: 7000
>> ssl_storage_port: 7001
>> listen_address: 104.239.200.33
>> start_native_transport: true
>> native_transport_port: 9042
>> start_rpc: true
>> rpc_address: localhost
>> rpc_port: 9160
>> rpc_keepalive: true
>>
>> rpc_server_type: sync
>> thrift_framed_transport_size_in_mb: 15
>> incremental_backups: false
>> snapshot_before_compaction: false
>> auto_snapshot: true
>>
>> tombstone_warn_threshold: 1000
>> tombstone_failure_threshold: 10
>>
>> column_index_size_in_kb: 64
>> batch_size_warn_threshold_in_kb: 5
>>
>> compaction_throughput_mb_per_sec: 16
>> compaction_large_partition_warning_threshold_mb: 100
>>
>> sstable_preemptive_open_interval_in_mb: 50
>>
>> read_request_timeout_in_ms: 5000
>> range_request_timeout_in_ms: 1
>>
>> write_request_timeout_in_ms: 2000
>> counter_write_request_timeout_in_ms: 5000
>> cas_contention_timeout_in_ms: 1000
>> truncate_request_timeout_in_ms: 6
>> request_timeout_in_ms: 1
>> cross_node_timeout: false
>> endpoint_snitch: PropertyFileSnitch
>>
>> dynamic_snitch_update_interval_in_ms: 100
>> dynamic_snitch_reset_interval_in_ms: 60
>> dynamic_snitch_badness_threshold: 0.1
>>
>> request_scheduler: org.apache.cassandra.scheduler.NoScheduler
>>
>> server_encryption_options:
>> internode_encryption: none
>> keystore: conf/.keystore
>> keystore_password: cassandra
>> truststore: conf/.truststore
>> truststore_password: cassandra
>>
>> client_encryption_options:
>> enabled: false
>> keystore: conf/.keystore
>> keystore_password: cassandra
>>
>> internode_compression: all
>> inter_dc_tcp_nodelay: false
>> 
>>
>>
>> What changes 

Re: Automatic pagination does not get all results

2015-10-23 Thread Sid Tantia
Hello Jeff,

I'm using Cassandra v2.1.4
I'm expecting the amount of results to be the same every time I use the
COPY command (specifically I'm using `COPY  TO stdout`). However
here are the counts of rows exported each time I ran COPY:
1) 180389 rows exported
2) 181212 rows exported
3) 178641 rows exported
4) 176688 rows exported
5) 175433 rows exported

So it's found a different amount of rows to export every single time I've
run the command, even though its on the same table and no additional writes
have been made.

CL for read is ALL
CL for write is ONE
Yes, I've run repair since last writing data.

On Thu, Oct 22, 2015 at 9:02 PM, Jeff Jirsa 
wrote:

> It’s possible that it could be different depending on your consistency
> level (on write and on read).
>
> It’s also possible it’s a bug, but you didn’t give us much information –
> here are some questions to help us help you:
>
> What version?
> What results are you seeing?
> What’s the “right” result?
> What CL did you use to write the data?
> What CL did you use to read the data?
> Have you run repair since writing the data?
>
>
> From: Sid Tantia
> Reply-To: "user@cassandra.apache.org"
> Date: Thursday, October 22, 2015 at 5:49 PM
> To: user
> Subject: Automatic pagination does not get all results
>
> Hello,
>
> Has anyone had a problem with automatic pagination returning different
> results everytime (this is for a table with ~180,000 rows)? I'm going
> through each page and inserting the results into an array and each time I
> go through all the pages, the resultant array has a different size.
>
> This happens whether I use a SELECT query with automatic paging using the
> Ruby driver or a COPY to CSV command with cqlsh.
>
> -Sid
>
>


CqlOutputFormat with auth

2015-10-23 Thread 曹志富
Hadoop2.6,cassandra2.1.6. Here is the exception stack:


Error: java.lang.RuntimeException: InvalidRequestException(why:You have not
logged in)
at
org.apache.cassandra.hadoop.cql3.CqlRecordWriter.(CqlRecordWriter.java:121)
at
org.apache.cassandra.hadoop.cql3.CqlRecordWriter.(CqlRecordWriter.java:88)
at
org.apache.cassandra.hadoop.cql3.CqlOutputFormat.getRecordWriter(CqlOutputFormat.java:74)
at
org.apache.cassandra.hadoop.cql3.CqlOutputFormat.getRecordWriter(CqlOutputFormat.java:55)
at
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.(ReduceTask.java:540)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:614)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: InvalidRequestException(why:You have not logged in)
at
org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result$execute_cql3_query_resultStandardScheme.read(Cassandra.java:49032)
at
org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result$execute_cql3_query_resultStandardScheme.read(Cassandra.java:49009)
at
org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result.read(Cassandra.java:48924)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at
org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql3_query(Cassandra.java:1693)
at
org.apache.cassandra.thrift.Cassandra$Client.execute_cql3_query(Cassandra.java:1678)
at
org.apache.cassandra.hadoop.cql3.CqlRecordWriter.retrievePartitionKeyValidator(CqlRecordWriter.java:335)
at
org.apache.cassandra.hadoop.cql3.CqlRecordWriter.(CqlRecordWriter.java:106)
... 11 more


this issue(https://issues.apache.org/jira/browse/CASSANDRA-7340) seems
still not complete fixed.



--
Ranger Tsao