Re: How to register cassandra custom codecs in spark? (https://github.com/datastax/spark-cassandra-connector/issues/1173)

2018-04-15 Thread Dinesh Joshi
Hi Revanth,
I took a quick look and don't think you can override existing mappings. You 
should see a warning like this -
18/04/15 19:50:27 WARN CodecRegistry: Ignoring codec CustomTSTypeCodec [date 
<-> java.lang.Long] because it collides with previously registered codec 
CustomTSTypeCodec [date <-> java.lang.Long]
The issue is that CodecRegistry does not allow you to override existing type 
mappings[1][2]. You can however register a codec that converts between a 
CqlType and your custom class.
Dinesh
[1] 
https://docs.datastax.com/en/developer/java-driver/3.1/manual/custom_codecs/[2] 
https://github.com/datastax/java-driver/blob/3.x/driver-core/src/main/java/com/datastax/driver/core/CodecRegistry.java#L357




 

On Sunday, April 15, 2018, 1:06:53 AM PDT, Revanth Reddy 
 wrote:  
 
 Hi Dinesh,
Thanks a lot for your response . I had tried with the existing codec 
(SimpleTimestampCodec) ,but it says the codec has been registered but it is not 
converting the timestamp column to long . also i don't observe any error.
Below are the steps that i had followed and also attached the code that i had 
used:
I had used this property to set the codec in spark 
config("spark.cassandra.connection.factory", 
"com.rasa.devops.cassandra.export.CustomConnectionFactory")

Since there is no direct way to refer the custom codec in spark , i had 
followed the above approach.
I had used the following dependencies :
               com.datastax.spark 
spark-cassandra-connector_2.11 
2.0.1   
com.datastax.cassandra 
cassandra-driver-extras 3.1.4 
  joda-time 
joda-time 2.9.1 
Cassandra table used for testing :

CREATE TABLE dev.my_table (

    partition_key text PRIMARY KEY,

    some_timestamp timestamp

);






cqlsh:dev> select * from my_table;




 partition_key | some_timestamp

---+--

           key | 2015-12-25 18:30:00+

           foo | 2018-04-11 06:22:29+









Thanks & RegardsS.Revanth kumar reddyhttps://github.com/Re1tReddy/
http://34.230.100.5:8080/crazyusers/login










    










                                                                                
           

On Sun, Apr 15, 2018 at 11:36 AM, Dinesh Joshi  wrote:

Hi Revanth,
How do you register the custom codec? Do you get any errors? Have you tried 
using a pre-existing codec? It would be helpful if you can give more 
information.
DineshOn Saturday, April 14, 2018, 7:29:30 PM PDT, Revanth Reddy 
 wrote:  
 
 

Hi Team ,
I want to write a custom cassandra codec and i want to use that codec in my 
spark application while reading the data from cassandra table .
Basically the custom codecs are used to convert one column type to another 
while reading from cassandra. for example i have a timestamp column in 
cassandra table ,but i want to read it as long and save as parquet file to HDFS 
, then this cutom codecs will be usefull instead of doing an additional 
transformation in the spark code.
I want to know how to set/register this custom cassandra codecs in spark ?
 I had tried with solution provided in the below link but it does not work , 
any ideas around this will be helpfull.
https://stackoverflow.com/ questions/40363611/adding- custom-codec-to- 
cassandraconnector/49750791? noredirect=1#comment86515847_ 49750791



Thanks & RegardsS.Revanth kumar reddyhttps://github.com/Re1tReddy/
http://34.230.100.5:8080/ crazyusers/login










    










                                                                                
           
  

  

Time serial column family design

2018-04-15 Thread Xiangfei Ni
Hi Experts,
  We have a design requirement, for example,
  Create table test(
 vin text,
 create_date int,
 create_time timestamp,
a text,
 b text,
 primary key ((vin,create_date),create_time))
  with clustering order by (create_time DESC);
  we store data in this table like this:
ZD41578123DSAFWE12313 |20180316| 2018-03-16 20:51:33.00+0800 |  P023  | P001
ZD41578123DSAFWE12313 |20180315| 2017-03-15 20:51:33.00+0800 |  P000  | P001
ZD41578123DSAFWE12313 |20180314| 2017-03-14 20:51:33.00+0800 |  P456  | P001
3431241241234 |20180317| 2017-03-17 20:51:33.00+0800 |  P000  | 
P001
3431241241234 |20180316| 2017-03-16 20:51:33.00+0800 |  P123  | 
P001
3431241241234 |20180315| 2017-03-15 20:51:33.00+0800 |  P456  | 
P001
3431241241234 |20180314| 2017-03-14 20:51:33.00+0800 |  P789  | 
P001
ZD41578123DSAFWE1 |20180314| 2017-03-14 20:51:33.00+0800 |  P023  | P001
  41034800994 |20180313| 2017-03-13 08:26:55.00+0800 | P0133  | 
P001
  41034800994 |20180312| 2017-03-12 08:26:55.00+0800 | P0420  | 
P001
We know that we can only use “=” or “in”for the partition key query,my question 
is that is there a convenient way to query a range result or other design for 
this requirement,for example 3 or 6 months  backward from nowadays,currently we 
can only use:
Select * from test where vin =“ZD41578123DSAFWE12313” and create_date in 
(20180416, 20180415, 20180414, 20180413, 20180412….);
But this cause the cql query is very long,and I don’t know whether there is 
limitation for the length of the cql.
Please give me some advice,thanks in advance.

Best Regards,

倪项菲/ David Ni
中移德电网络科技有限公司
Virtue Intelligent Network Ltd, co.
Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
Mob: +86 13797007811|Tel: + 86 27 5024 2516



Re: Mailing list server IPs

2018-04-15 Thread Nate McCall
Hi Jacques,
Thanks for bringing this up. I took a quick look through the INFRA project
and saw a couple of resolved issues that might help:
https://issues.apache.org/jira/browse/INFRA-6584?jql=project%20%3D%20INFRA%20AND%20text%20~%20%22mail%20server%20whitelist%22

If those don't do it for you, please open a new issue with INFRA.


On Sat, Apr 14, 2018 at 1:19 AM, Jacques-Henri Berthemet <
jacques-henri.berthe...@genesys.com> wrote:

> I checked with IT and I missed an email on the period where I got the last
> bounce. It’s not a very big deal but I’d like to have it fixed if
> possible.
>
>
>
> Gmail servers are very picky on SMTP traffic and reject a lot of things.
>
>
>
> *--*
>
> *Jacques-Henri Berthemet*
>
>
>
> *From:* Nicolas Guyomar [mailto:nicolas.guyo...@gmail.com]
> *Sent:* Friday, April 13, 2018 3:15 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Mailing list server IPs
>
>
>
> Hi,
>
>
>
> I receive similar messages from time to time, and I'm using Gmail ;)  I
> believe I never missed a mail on the ML and that you can safely ignore this
> message
>
>
>
> On 13 April 2018 at 15:06, Jacques-Henri Berthemet <
> jacques-henri.berthe...@genesys.com> wrote:
>
> Hi,
>
>
>
> I’m getting bounce messages from the ML from time to time, see attached
> example. Our IT told me that they need to whitelist all IPs used by
> Cassandra ML server. Is there a way to get those IPs?
>
>
>
> Sorry if it’s not really related to Cassandra itself but I didn’t find
> anything in http://untroubled.org/ezmlm/ezman/ezman5.html commands.
>
>
>
> Regards,
>
> --
>
> Jacques-Henri Berthemet
>
>
>
> -- Forwarded message --
> From: "user-h...@cassandra.apache.org" 
> To: Jacques-Henri Berthemet 
> Cc:
> Bcc:
> Date: Fri, 6 Apr 2018 20:47:22 +
> Subject: Warning from user@cassandra.apache.org
> Hi! This is the ezmlm program. I'm managing the
> user@cassandra.apache.org mailing list.
>
>
> Messages to you from the user mailing list seem to
> have been bouncing. I've attached a copy of the first bounce
> message I received.
>
> If this message bounces too, I will send you a probe. If the probe bounces,
> I will remove your address from the user mailing list,
> without further notice.
>
>
> I've kept a list of which messages from the user mailing list have
> bounced from your address.
>
> Copies of these messages may be in the archive.
> To retrieve a set of messages 123-145 (a maximum of 100 per request),
> send a short message to:
>
>
> To receive a subject and author list for the last 100 or so messages,
> send a short message to:
>
>
> Here are the message numbers:
>
>60535
>60536
>60548
>
> --- Enclosed is a copy of the bounce message I received.
>
> Return-Path: <>
> Received: (qmail 8848 invoked for bounce); 27 Mar 2018 14:22:11 -
> Date: 27 Mar 2018 14:22:11 -
> From: mailer-dae...@apache.org
> To: user-return-605...@cassandra.apache.org
> Subject: failure notice
>
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
>



-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Shifting data to DCOS

2018-04-15 Thread Patrick Bannister
*nodetool ring* will give you the tokens for each node on the ring. Each
node has the token range between the previous node's token and its token -
so the token range for each node is the interval (previous_token,
this_token]. The first node in the ring has the range between the last
node's token and its token (the "wrapping range").

Patrick Bannister


On Sun, Apr 15, 2018 at 11:23 AM, Faraz Mateen  wrote:

> *UPDATE* - I created schema for all the tables in one of the keypsaces,
> copied data to new directories and ran nodetool refresh. However, a lot of
> data seems to be missing.
>
> I ran nodetool repair on all three nodes one by one. First two nodes took
> around 20 minutes (each) to complete. Third node took a lot of time to
> repair and did not complete even in 14 hours. Eventually I had to stop it
> manually.
>
> *nodetool compactionstats *give me the "pending tasks by table name"
> traceback which can be viewed here:
> https://gist.github.com/farazmateen/10adce4b2477457f0e20fc95176f66a3
>
> *nodetool netstats* shows a lot of dropped gossip messages on all the
> nodes. Here is the output from one of the nodes:
>
> Mode: NORMALNot sending any streams.Read Repair Statistics:Attempted: 
> 0Mismatch (Blocking): 1Mismatch (Background): 2Pool Name
> Active   Pending  Completed   DroppedLarge messages  n/a  
>0 92 1Small messages  n/a 
> 0 355491 0Gossip messages n/a 5   
>  3726945286613
>
> Is the problem related to token ranges? How can I find out token range for
> each node?
> What can I do to further debug and root cause this?
>
> On Tue, Apr 10, 2018 at 4:28 PM, Faraz Mateen  wrote:
>
>> Sorry for the late reply. I was trying to figure out some other approach
>> to it.
>>
>> @Kurt - My previous cluster has 3 nodes but replication factor is 2. I am
>> not exactly sure how I would handle the tokens. Can you explain that a bit?
>>
>> @Michael - Actually, my DC/OS cluster has an older version than my
>> previous cluster. However both of them have hash with their data
>> directories. Previous cluster is on version 3.9 while new DC/OS cluster is
>> on 3.0.16.
>>
>>
>> On Fri, Apr 6, 2018 at 2:35 PM, kurt greaves 
>> wrote:
>>
>>> Without looking at the code I'd say maybe the keyspaces are displayed
>>> purely because the directories exist (but it seems unlikely). The process
>>> you should follow instead is to exclude the system keyspaces for each node
>>> and manually apply your schema, then upload your CFs into the correct
>>> directory. Note this only works when RF=#nodes, if you have more nodes you
>>> need to take tokens into account when restoring.
>>>
>>>
>>> On Fri., 6 Apr. 2018, 17:16 Affan Syed,  wrote:
>>>
 Michael,

 both of the folders are with hash, so I dont think that would be an
 issue.

 What is strange is why the tables dont show up if the keyspaces are
 visible. Shouldnt that be a meta data that can be edited once and then be
 visible?

 Affan

 - Affan

 On Thu, Apr 5, 2018 at 7:55 PM, Michael Shuler 
 wrote:

> On 04/05/2018 09:04 AM, Faraz Mateen wrote:
> >
> > For example,  if the table is *data_main_bim_dn_10*, its data
> directory
> > is named data_main_bim_dn_10-a73202c02bf311e8b5106b13f463f8b9. I
> created
> > a new table with the same name through cqlsh. This resulted in
> creation
> > of another directory with a different hash i.e.
> > data_main_bim_dn_10-c146e8d038c611e8b48cb7bc120612c9. I copied all
> data
> > from the former to the latter.
> >
> > Then I ran *"nodetool refresh ks1  data_main_bim_dn_10"*. After that
> I
> > was able to access all data contents through cqlsh.
> >
> > Now, the problem is, I have around 500 tables and the method I
> mentioned
> > above is quite cumbersome. Bulkloading through sstableloader or
> remote
> > seeding are also a couple of options but they will take a lot of
> time.
> > Does anyone know an easier way to shift all my data to new setup on
> DC/OS?
>
> For upgrade support from older versions of C* that did not have the
> hash
> on the data directory, the table data dir can be just
> `data_main_bim_dn_10` without the appended hash, as in your example.
>
> Give that a quick test to see if that simplifies things for you.
>
> --
> Kind regards,
> Michael
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

>>
>>
>> --
>> Faraz Mateen
>>
>
>
>
> --
> Faraz Mateen
>


Re: Shifting data to DCOS

2018-04-15 Thread Faraz Mateen
*UPDATE* - I created schema for all the tables in one of the keypsaces,
copied data to new directories and ran nodetool refresh. However, a lot of
data seems to be missing.

I ran nodetool repair on all three nodes one by one. First two nodes took
around 20 minutes (each) to complete. Third node took a lot of time to
repair and did not complete even in 14 hours. Eventually I had to stop it
manually.

*nodetool compactionstats *give me the "pending tasks by table name"
traceback which can be viewed here:
https://gist.github.com/farazmateen/10adce4b2477457f0e20fc95176f66a3

*nodetool netstats* shows a lot of dropped gossip messages on all the
nodes. Here is the output from one of the nodes:

Mode: NORMALNot sending any streams.Read Repair Statistics:Attempted:
0Mismatch (Blocking): 1Mismatch (Background): 2Pool Name
 Active   Pending  Completed   DroppedLarge messages
   n/a 0 92 1Small messages
  n/a 0 355491 0Gossip messages
 n/a 53726945286613

Is the problem related to token ranges? How can I find out token range for
each node?
What can I do to further debug and root cause this?

On Tue, Apr 10, 2018 at 4:28 PM, Faraz Mateen  wrote:

> Sorry for the late reply. I was trying to figure out some other approach
> to it.
>
> @Kurt - My previous cluster has 3 nodes but replication factor is 2. I am
> not exactly sure how I would handle the tokens. Can you explain that a bit?
>
> @Michael - Actually, my DC/OS cluster has an older version than my
> previous cluster. However both of them have hash with their data
> directories. Previous cluster is on version 3.9 while new DC/OS cluster is
> on 3.0.16.
>
>
> On Fri, Apr 6, 2018 at 2:35 PM, kurt greaves  wrote:
>
>> Without looking at the code I'd say maybe the keyspaces are displayed
>> purely because the directories exist (but it seems unlikely). The process
>> you should follow instead is to exclude the system keyspaces for each node
>> and manually apply your schema, then upload your CFs into the correct
>> directory. Note this only works when RF=#nodes, if you have more nodes you
>> need to take tokens into account when restoring.
>>
>>
>> On Fri., 6 Apr. 2018, 17:16 Affan Syed,  wrote:
>>
>>> Michael,
>>>
>>> both of the folders are with hash, so I dont think that would be an
>>> issue.
>>>
>>> What is strange is why the tables dont show up if the keyspaces are
>>> visible. Shouldnt that be a meta data that can be edited once and then be
>>> visible?
>>>
>>> Affan
>>>
>>> - Affan
>>>
>>> On Thu, Apr 5, 2018 at 7:55 PM, Michael Shuler 
>>> wrote:
>>>
 On 04/05/2018 09:04 AM, Faraz Mateen wrote:
 >
 > For example,  if the table is *data_main_bim_dn_10*, its data
 directory
 > is named data_main_bim_dn_10-a73202c02bf311e8b5106b13f463f8b9. I
 created
 > a new table with the same name through cqlsh. This resulted in
 creation
 > of another directory with a different hash i.e.
 > data_main_bim_dn_10-c146e8d038c611e8b48cb7bc120612c9. I copied all
 data
 > from the former to the latter.
 >
 > Then I ran *"nodetool refresh ks1  data_main_bim_dn_10"*. After that I
 > was able to access all data contents through cqlsh.
 >
 > Now, the problem is, I have around 500 tables and the method I
 mentioned
 > above is quite cumbersome. Bulkloading through sstableloader or remote
 > seeding are also a couple of options but they will take a lot of time.
 > Does anyone know an easier way to shift all my data to new setup on
 DC/OS?

 For upgrade support from older versions of C* that did not have the hash
 on the data directory, the table data dir can be just
 `data_main_bim_dn_10` without the appended hash, as in your example.

 Give that a quick test to see if that simplifies things for you.

 --
 Kind regards,
 Michael

 -
 To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: user-h...@cassandra.apache.org


>>>
>
>
> --
> Faraz Mateen
>



-- 
Faraz Mateen


Re: How to register cassandra custom codecs in spark? (https://github.com/datastax/spark-cassandra-connector/issues/1173)

2018-04-15 Thread Dinesh Joshi
Hi Revanth,
How do you register the custom codec? Do you get any errors? Have you tried 
using a pre-existing codec? It would be helpful if you can give more 
information.
DineshOn Saturday, April 14, 2018, 7:29:30 PM PDT, Revanth Reddy 
 wrote:  
 
 

Hi Team ,
I want to write a custom cassandra codec and i want to use that codec in my 
spark application while reading the data from cassandra table .
Basically the custom codecs are used to convert one column type to another 
while reading from cassandra. for example i have a timestamp column in 
cassandra table ,but i want to read it as long and save as parquet file to HDFS 
, then this cutom codecs will be usefull instead of doing an additional 
transformation in the spark code.
I want to know how to set/register this custom cassandra codecs in spark ?
 I had tried with solution provided in the below link but it does not work , 
any ideas around this will be helpfull.
https://stackoverflow.com/questions/40363611/adding-custom-codec-to-cassandraconnector/49750791?noredirect=1#comment86515847_49750791



Thanks & RegardsS.Revanth kumar reddyhttps://github.com/Re1tReddy/
http://34.230.100.5:8080/crazyusers/login