Re: How to register cassandra custom codecs in spark? (https://github.com/datastax/spark-cassandra-connector/issues/1173)
Hi Revanth, I took a quick look and don't think you can override existing mappings. You should see a warning like this - 18/04/15 19:50:27 WARN CodecRegistry: Ignoring codec CustomTSTypeCodec [date <-> java.lang.Long] because it collides with previously registered codec CustomTSTypeCodec [date <-> java.lang.Long] The issue is that CodecRegistry does not allow you to override existing type mappings[1][2]. You can however register a codec that converts between a CqlType and your custom class. Dinesh [1] https://docs.datastax.com/en/developer/java-driver/3.1/manual/custom_codecs/[2] https://github.com/datastax/java-driver/blob/3.x/driver-core/src/main/java/com/datastax/driver/core/CodecRegistry.java#L357 On Sunday, April 15, 2018, 1:06:53 AM PDT, Revanth Reddywrote: Hi Dinesh, Thanks a lot for your response . I had tried with the existing codec (SimpleTimestampCodec) ,but it says the codec has been registered but it is not converting the timestamp column to long . also i don't observe any error. Below are the steps that i had followed and also attached the code that i had used: I had used this property to set the codec in spark config("spark.cassandra.connection.factory", "com.rasa.devops.cassandra.export.CustomConnectionFactory") Since there is no direct way to refer the custom codec in spark , i had followed the above approach. I had used the following dependencies : com.datastax.spark spark-cassandra-connector_2.11 2.0.1 com.datastax.cassandra cassandra-driver-extras 3.1.4 joda-time joda-time 2.9.1 Cassandra table used for testing : CREATE TABLE dev.my_table ( partition_key text PRIMARY KEY, some_timestamp timestamp ); cqlsh:dev> select * from my_table; partition_key | some_timestamp ---+-- key | 2015-12-25 18:30:00+ foo | 2018-04-11 06:22:29+ Thanks & RegardsS.Revanth kumar reddyhttps://github.com/Re1tReddy/ http://34.230.100.5:8080/crazyusers/login On Sun, Apr 15, 2018 at 11:36 AM, Dinesh Joshi wrote: Hi Revanth, How do you register the custom codec? Do you get any errors? Have you tried using a pre-existing codec? It would be helpful if you can give more information. DineshOn Saturday, April 14, 2018, 7:29:30 PM PDT, Revanth Reddy wrote: Hi Team , I want to write a custom cassandra codec and i want to use that codec in my spark application while reading the data from cassandra table . Basically the custom codecs are used to convert one column type to another while reading from cassandra. for example i have a timestamp column in cassandra table ,but i want to read it as long and save as parquet file to HDFS , then this cutom codecs will be usefull instead of doing an additional transformation in the spark code. I want to know how to set/register this custom cassandra codecs in spark ? I had tried with solution provided in the below link but it does not work , any ideas around this will be helpfull. https://stackoverflow.com/ questions/40363611/adding- custom-codec-to- cassandraconnector/49750791? noredirect=1#comment86515847_ 49750791 Thanks & RegardsS.Revanth kumar reddyhttps://github.com/Re1tReddy/ http://34.230.100.5:8080/ crazyusers/login
Time serial column family design
Hi Experts, We have a design requirement, for example, Create table test( vin text, create_date int, create_time timestamp, a text, b text, primary key ((vin,create_date),create_time)) with clustering order by (create_time DESC); we store data in this table like this: ZD41578123DSAFWE12313 |20180316| 2018-03-16 20:51:33.00+0800 | P023 | P001 ZD41578123DSAFWE12313 |20180315| 2017-03-15 20:51:33.00+0800 | P000 | P001 ZD41578123DSAFWE12313 |20180314| 2017-03-14 20:51:33.00+0800 | P456 | P001 3431241241234 |20180317| 2017-03-17 20:51:33.00+0800 | P000 | P001 3431241241234 |20180316| 2017-03-16 20:51:33.00+0800 | P123 | P001 3431241241234 |20180315| 2017-03-15 20:51:33.00+0800 | P456 | P001 3431241241234 |20180314| 2017-03-14 20:51:33.00+0800 | P789 | P001 ZD41578123DSAFWE1 |20180314| 2017-03-14 20:51:33.00+0800 | P023 | P001 41034800994 |20180313| 2017-03-13 08:26:55.00+0800 | P0133 | P001 41034800994 |20180312| 2017-03-12 08:26:55.00+0800 | P0420 | P001 We know that we can only use “=” or “in”for the partition key query,my question is that is there a convenient way to query a range result or other design for this requirement,for example 3 or 6 months backward from nowadays,currently we can only use: Select * from test where vin =“ZD41578123DSAFWE12313” and create_date in (20180416, 20180415, 20180414, 20180413, 20180412….); But this cause the cql query is very long,and I don’t know whether there is limitation for the length of the cql. Please give me some advice,thanks in advance. Best Regards, 倪项菲/ David Ni 中移德电网络科技有限公司 Virtue Intelligent Network Ltd, co. Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei Mob: +86 13797007811|Tel: + 86 27 5024 2516
Re: Mailing list server IPs
Hi Jacques, Thanks for bringing this up. I took a quick look through the INFRA project and saw a couple of resolved issues that might help: https://issues.apache.org/jira/browse/INFRA-6584?jql=project%20%3D%20INFRA%20AND%20text%20~%20%22mail%20server%20whitelist%22 If those don't do it for you, please open a new issue with INFRA. On Sat, Apr 14, 2018 at 1:19 AM, Jacques-Henri Berthemet < jacques-henri.berthe...@genesys.com> wrote: > I checked with IT and I missed an email on the period where I got the last > bounce. It’s not a very big deal but I’d like to have it fixed if > possible. > > > > Gmail servers are very picky on SMTP traffic and reject a lot of things. > > > > *--* > > *Jacques-Henri Berthemet* > > > > *From:* Nicolas Guyomar [mailto:nicolas.guyo...@gmail.com] > *Sent:* Friday, April 13, 2018 3:15 PM > *To:* user@cassandra.apache.org > *Subject:* Re: Mailing list server IPs > > > > Hi, > > > > I receive similar messages from time to time, and I'm using Gmail ;) I > believe I never missed a mail on the ML and that you can safely ignore this > message > > > > On 13 April 2018 at 15:06, Jacques-Henri Berthemet < > jacques-henri.berthe...@genesys.com> wrote: > > Hi, > > > > I’m getting bounce messages from the ML from time to time, see attached > example. Our IT told me that they need to whitelist all IPs used by > Cassandra ML server. Is there a way to get those IPs? > > > > Sorry if it’s not really related to Cassandra itself but I didn’t find > anything in http://untroubled.org/ezmlm/ezman/ezman5.html commands. > > > > Regards, > > -- > > Jacques-Henri Berthemet > > > > -- Forwarded message -- > From: "user-h...@cassandra.apache.org"> To: Jacques-Henri Berthemet > Cc: > Bcc: > Date: Fri, 6 Apr 2018 20:47:22 + > Subject: Warning from user@cassandra.apache.org > Hi! This is the ezmlm program. I'm managing the > user@cassandra.apache.org mailing list. > > > Messages to you from the user mailing list seem to > have been bouncing. I've attached a copy of the first bounce > message I received. > > If this message bounces too, I will send you a probe. If the probe bounces, > I will remove your address from the user mailing list, > without further notice. > > > I've kept a list of which messages from the user mailing list have > bounced from your address. > > Copies of these messages may be in the archive. > To retrieve a set of messages 123-145 (a maximum of 100 per request), > send a short message to: > > > To receive a subject and author list for the last 100 or so messages, > send a short message to: > > > Here are the message numbers: > >60535 >60536 >60548 > > --- Enclosed is a copy of the bounce message I received. > > Return-Path: <> > Received: (qmail 8848 invoked for bounce); 27 Mar 2018 14:22:11 - > Date: 27 Mar 2018 14:22:11 - > From: mailer-dae...@apache.org > To: user-return-605...@cassandra.apache.org > Subject: failure notice > > > > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > > > -- - Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: Shifting data to DCOS
*nodetool ring* will give you the tokens for each node on the ring. Each node has the token range between the previous node's token and its token - so the token range for each node is the interval (previous_token, this_token]. The first node in the ring has the range between the last node's token and its token (the "wrapping range"). Patrick Bannister On Sun, Apr 15, 2018 at 11:23 AM, Faraz Mateenwrote: > *UPDATE* - I created schema for all the tables in one of the keypsaces, > copied data to new directories and ran nodetool refresh. However, a lot of > data seems to be missing. > > I ran nodetool repair on all three nodes one by one. First two nodes took > around 20 minutes (each) to complete. Third node took a lot of time to > repair and did not complete even in 14 hours. Eventually I had to stop it > manually. > > *nodetool compactionstats *give me the "pending tasks by table name" > traceback which can be viewed here: > https://gist.github.com/farazmateen/10adce4b2477457f0e20fc95176f66a3 > > *nodetool netstats* shows a lot of dropped gossip messages on all the > nodes. Here is the output from one of the nodes: > > Mode: NORMALNot sending any streams.Read Repair Statistics:Attempted: > 0Mismatch (Blocking): 1Mismatch (Background): 2Pool Name > Active Pending Completed DroppedLarge messages n/a >0 92 1Small messages n/a > 0 355491 0Gossip messages n/a 5 > 3726945286613 > > Is the problem related to token ranges? How can I find out token range for > each node? > What can I do to further debug and root cause this? > > On Tue, Apr 10, 2018 at 4:28 PM, Faraz Mateen wrote: > >> Sorry for the late reply. I was trying to figure out some other approach >> to it. >> >> @Kurt - My previous cluster has 3 nodes but replication factor is 2. I am >> not exactly sure how I would handle the tokens. Can you explain that a bit? >> >> @Michael - Actually, my DC/OS cluster has an older version than my >> previous cluster. However both of them have hash with their data >> directories. Previous cluster is on version 3.9 while new DC/OS cluster is >> on 3.0.16. >> >> >> On Fri, Apr 6, 2018 at 2:35 PM, kurt greaves >> wrote: >> >>> Without looking at the code I'd say maybe the keyspaces are displayed >>> purely because the directories exist (but it seems unlikely). The process >>> you should follow instead is to exclude the system keyspaces for each node >>> and manually apply your schema, then upload your CFs into the correct >>> directory. Note this only works when RF=#nodes, if you have more nodes you >>> need to take tokens into account when restoring. >>> >>> >>> On Fri., 6 Apr. 2018, 17:16 Affan Syed, wrote: >>> Michael, both of the folders are with hash, so I dont think that would be an issue. What is strange is why the tables dont show up if the keyspaces are visible. Shouldnt that be a meta data that can be edited once and then be visible? Affan - Affan On Thu, Apr 5, 2018 at 7:55 PM, Michael Shuler wrote: > On 04/05/2018 09:04 AM, Faraz Mateen wrote: > > > > For example, if the table is *data_main_bim_dn_10*, its data > directory > > is named data_main_bim_dn_10-a73202c02bf311e8b5106b13f463f8b9. I > created > > a new table with the same name through cqlsh. This resulted in > creation > > of another directory with a different hash i.e. > > data_main_bim_dn_10-c146e8d038c611e8b48cb7bc120612c9. I copied all > data > > from the former to the latter. > > > > Then I ran *"nodetool refresh ks1 data_main_bim_dn_10"*. After that > I > > was able to access all data contents through cqlsh. > > > > Now, the problem is, I have around 500 tables and the method I > mentioned > > above is quite cumbersome. Bulkloading through sstableloader or > remote > > seeding are also a couple of options but they will take a lot of > time. > > Does anyone know an easier way to shift all my data to new setup on > DC/OS? > > For upgrade support from older versions of C* that did not have the > hash > on the data directory, the table data dir can be just > `data_main_bim_dn_10` without the appended hash, as in your example. > > Give that a quick test to see if that simplifies things for you. > > -- > Kind regards, > Michael > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > > >> >> >> -- >> Faraz Mateen >> > > > > -- > Faraz Mateen >
Re: Shifting data to DCOS
*UPDATE* - I created schema for all the tables in one of the keypsaces, copied data to new directories and ran nodetool refresh. However, a lot of data seems to be missing. I ran nodetool repair on all three nodes one by one. First two nodes took around 20 minutes (each) to complete. Third node took a lot of time to repair and did not complete even in 14 hours. Eventually I had to stop it manually. *nodetool compactionstats *give me the "pending tasks by table name" traceback which can be viewed here: https://gist.github.com/farazmateen/10adce4b2477457f0e20fc95176f66a3 *nodetool netstats* shows a lot of dropped gossip messages on all the nodes. Here is the output from one of the nodes: Mode: NORMALNot sending any streams.Read Repair Statistics:Attempted: 0Mismatch (Blocking): 1Mismatch (Background): 2Pool Name Active Pending Completed DroppedLarge messages n/a 0 92 1Small messages n/a 0 355491 0Gossip messages n/a 53726945286613 Is the problem related to token ranges? How can I find out token range for each node? What can I do to further debug and root cause this? On Tue, Apr 10, 2018 at 4:28 PM, Faraz Mateenwrote: > Sorry for the late reply. I was trying to figure out some other approach > to it. > > @Kurt - My previous cluster has 3 nodes but replication factor is 2. I am > not exactly sure how I would handle the tokens. Can you explain that a bit? > > @Michael - Actually, my DC/OS cluster has an older version than my > previous cluster. However both of them have hash with their data > directories. Previous cluster is on version 3.9 while new DC/OS cluster is > on 3.0.16. > > > On Fri, Apr 6, 2018 at 2:35 PM, kurt greaves wrote: > >> Without looking at the code I'd say maybe the keyspaces are displayed >> purely because the directories exist (but it seems unlikely). The process >> you should follow instead is to exclude the system keyspaces for each node >> and manually apply your schema, then upload your CFs into the correct >> directory. Note this only works when RF=#nodes, if you have more nodes you >> need to take tokens into account when restoring. >> >> >> On Fri., 6 Apr. 2018, 17:16 Affan Syed, wrote: >> >>> Michael, >>> >>> both of the folders are with hash, so I dont think that would be an >>> issue. >>> >>> What is strange is why the tables dont show up if the keyspaces are >>> visible. Shouldnt that be a meta data that can be edited once and then be >>> visible? >>> >>> Affan >>> >>> - Affan >>> >>> On Thu, Apr 5, 2018 at 7:55 PM, Michael Shuler >>> wrote: >>> On 04/05/2018 09:04 AM, Faraz Mateen wrote: > > For example, if the table is *data_main_bim_dn_10*, its data directory > is named data_main_bim_dn_10-a73202c02bf311e8b5106b13f463f8b9. I created > a new table with the same name through cqlsh. This resulted in creation > of another directory with a different hash i.e. > data_main_bim_dn_10-c146e8d038c611e8b48cb7bc120612c9. I copied all data > from the former to the latter. > > Then I ran *"nodetool refresh ks1 data_main_bim_dn_10"*. After that I > was able to access all data contents through cqlsh. > > Now, the problem is, I have around 500 tables and the method I mentioned > above is quite cumbersome. Bulkloading through sstableloader or remote > seeding are also a couple of options but they will take a lot of time. > Does anyone know an easier way to shift all my data to new setup on DC/OS? For upgrade support from older versions of C* that did not have the hash on the data directory, the table data dir can be just `data_main_bim_dn_10` without the appended hash, as in your example. Give that a quick test to see if that simplifies things for you. -- Kind regards, Michael - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org >>> > > > -- > Faraz Mateen > -- Faraz Mateen
Re: How to register cassandra custom codecs in spark? (https://github.com/datastax/spark-cassandra-connector/issues/1173)
Hi Revanth, How do you register the custom codec? Do you get any errors? Have you tried using a pre-existing codec? It would be helpful if you can give more information. DineshOn Saturday, April 14, 2018, 7:29:30 PM PDT, Revanth Reddywrote: Hi Team , I want to write a custom cassandra codec and i want to use that codec in my spark application while reading the data from cassandra table . Basically the custom codecs are used to convert one column type to another while reading from cassandra. for example i have a timestamp column in cassandra table ,but i want to read it as long and save as parquet file to HDFS , then this cutom codecs will be usefull instead of doing an additional transformation in the spark code. I want to know how to set/register this custom cassandra codecs in spark ? I had tried with solution provided in the below link but it does not work , any ideas around this will be helpfull. https://stackoverflow.com/questions/40363611/adding-custom-codec-to-cassandraconnector/49750791?noredirect=1#comment86515847_49750791 Thanks & RegardsS.Revanth kumar reddyhttps://github.com/Re1tReddy/ http://34.230.100.5:8080/crazyusers/login