Re: Read performance in map data type
I've observed that reducing fetch size results in better latency (isn't that obvious :-)), tried from fetch size varying from 100 to 1, seeing a lot of errors for 1. Haven't tried modifying the number of columns. Let me start a new thread focused on fetch size. On Wed, Apr 2, 2014 at 9:53 AM, Sourabh Agrawal iitr.sour...@gmail.comwrote: From the doc : The fetch size controls how much resulting rows will be retrieved simultaneously. So, I guess it does not depend on the number of columns as such. As all the columns for a key reside on the same node, I think it wouldn't matter much whatever be the number of columns as long as we have enough memory in the app. Default value is 5000. (com.datastax.driver.core.QueryOptions) We use it with the default value. I have never profiled cassandra for read load. If you profile it for different fetch sizes, please share the results :) On Wed, Apr 2, 2014 at 8:45 AM, Apoorva Gaurav apoorva.gau...@myntra.comwrote: Thanks Sourabh, I've modelled my table as studentID int, subjectID int, marks int, PRIMARY KEY(studentID, subjectID) as primarily I'll be querying using studentID and sometime using studentID and subjectID. I've tried driver 2.0.0 and its giving good results. Also using its auto paging feature. Any idea what should be a typical value for fetch size. And does the fetch size depends on how many columns are there in the CQL table for e.g. should fetch size in a table like studentID int, subjectID int, marks1 int, marks2 int, marks3 int marksN int PRIMARY KEY(studentID, subjectID) be less than fetch size in studentID int, subjectID int, marks int, PRIMARY KEY(studentID, subjectID) On Wed, Apr 2, 2014 at 2:20 AM, Robert Coli rc...@eventbrite.com wrote: On Mon, Mar 31, 2014 at 9:13 PM, Apoorva Gaurav apoorva.gau...@myntra.com wrote: Thanks Robert, Is there a workaround, as in our test setups we keep dropping and recreating tables. Use unique keyspace (or table) names for each test? That's the approach they're taking in 5202... =Rob -- Thanks Regards, Apoorva -- Sourabh Agrawal Bangalore +91 9945657973 -- Thanks Regards, Apoorva
optimum fetch size in datastax driver
Hello All, We have a schema which can be modelled as *(studentID int, subjectID int, marks int, PRIMARY KEY(studentID, subjectID)*. There can be ~1M studentIDs and for each studentID there can be ~10K subjectIDs. The queries can be using studentID and studentID-subjectID We have a 3 node (each having 24 cores) apache cassandra 2.0.4 cluster and are using datastax driver 2.0.0 to interact with it using its automatic paging feature. I've tried various fetch sizes varying from 100 to 10K and observed that read latency increases with fetch size (which looks obvious). At around 10K there are a lot of errors. Want to understand :- - Is there a rule of thumb for deciding on the optimum fetch size ( *com.datastax.driver.core.Statement.setFetchSize()* ). - Does cassandra keeps the entire result in cache and only returns the rows corresponding to the fetch size or it treats subsequent as new queries ( *com.datastax.driver.core.**ResultSet.fetchMoreResults() *) - Whether the optimum fetch size depends on number of columns in CQL table for e.g. should fetch size in a table like ***studentID int, subjectID int, marks1 int, marks2 int, marks3 int marksN int PRIMARY KEY(studentID, subjectID)* be less than fetch size in *studentID int, subjectID int, marks int, PRIMARY KEY(studentID, subjectID)* -- Thanks Regards, Apoorva
Inserts with a dynamic datamodel using Datastax java driver
Hello, I am building a write client in java to insert records into Cassandra 2.0.5. I am using the Datastax java driver. Problem : The datamodel is dynamic. By dynamic, I mean that the number of columns and the datatype of columns will be given as an input by the user. It has only 1 keyspace and 1 column family. For inserting records bound statements seems the way to go. But the bind() function accepts only a sequence of Objects ( column values) . How do I bind the values when the number and datatype of columns is given as input? Any suggestions? Thanks Regards, Varsha
Re: Inserts with a dynamic datamodel using Datastax java driver
Hello Varsha Your best bet is to go with blob type by serializing all data into bytes. Another alternative is to use text and serialize to JSON. For the dynamic columns, use clustering columns in CQL3 with blob/text type Regards Duy Hai DOAN On Wed, Apr 2, 2014 at 11:21 AM, Raveendran, Varsha IN BLR STS varsha.raveend...@siemens.com wrote: Hello, I am building a write client in java to insert records into Cassandra 2.0.5. I am using the Datastax java driver. *Problem** : * The datamodel is dynamic. By dynamic, I mean that the number of columns and the datatype of columns will be given as an input by the user. It has only 1 keyspace and 1 column family. For inserting records bound statements seems the way to go. But the bind() function accepts only a sequence of Objects ( column values) . How do I bind the values when the number and datatype of columns is given as input? Any suggestions? Thanks Regards, Varsha
Exporting column family data to csv
I want to export all the data of particular column family to the text file from Cassandra cluster. I tried copy keyspace.mycolumnfamily to '/root/ddd/xx.csv'; It gave me timeout error I tried below in Cassandra.yaml request_timeout_in_ms: 1000 read_request_timeout_in_ms: 1000 range_request_timeout_in_ms: 1000 truncate_request_timeout_in_ms: 1000 I still have no luck. Any advise how to achieve this? I am NOT limited to copy command. What is the best way to achieve this? Thanks in advance for the help. ng
RE: Exporting column family data to csv
http://mail-archives.apache.org/mod_mbox/cassandra-user/201309.mbox/%3C9AF3ADEDDFED4DDEA840B8F5C6286BBA@vig.local%3E http://stackoverflow.com/questions/18872422/rpc-timeout-error-while-exporting-data-from-cql Google for more. Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-03163 Vilnius, Lithuania Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider Experience Adform DNAhttp://vimeo.com/76421547 [Adform News] http://www.adform.com [Adform awarded the Best Employer 2012] http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/ Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. From: ng [mailto:pipeli...@gmail.com] Sent: Wednesday, April 2, 2014 6:04 PM To: user@cassandra.apache.org Subject: Exporting column family data to csv I want to export all the data of particular column family to the text file from Cassandra cluster. I tried copy keyspace.mycolumnfamily to '/root/ddd/xx.csv'; It gave me timeout error I tried below in Cassandra.yaml request_timeout_in_ms: 1000 read_request_timeout_in_ms: 1000 range_request_timeout_in_ms: 1000 truncate_request_timeout_in_ms: 1000 I still have no luck. Any advise how to achieve this? I am NOT limited to copy command. What is the best way to achieve this? Thanks in advance for the help. ng inline: signature-logo29.pnginline: signature-best-employer-logo4823.png
RE: Inserts with a dynamic datamodel using Datastax java driver
Hi, Thanks for replying. I dint quite get what you meant by use clustering columns in CQL3 with blob/text type. I have elaborated my problem statement below. Assume the schema of the keyspace to which random records need to be inserted is given in the following format : KeySpace Name : KS_1 ColumnFamilyName : CF_1 Columns: [Column1 : uuid , Column2: varint, Column3: timestamp, ... ColumnN:text] So I parse this file to get the schema. Also, the data/value for each column should be generated randomly depending on the datatype of the column. My question is how do I insert the records ? 1. I created a prepared statement depending on the number of columns (using a for loop). Then for each record I called methods like setDate() or setVarint() to bind the values. But this was taking too much time because data was being generated for each column , then set in the prepared statement and then inserted. And the number of records = 1 billion!! 2. The executeAsync () function seemed more likely to be faster. But the problem is that the bind() function takes a sequence of values. Since the number of columns is variable I am not able to make this code generic (i.e to cater to any schema given by the user) . I am not sure if there is another way to approach this problem. Thanks Regards, Varsha From: DuyHai Doan [mailto:doanduy...@gmail.com] Sent: Wednesday, April 02, 2014 4:05 PM To: user@cassandra.apache.org Subject: Re: Inserts with a dynamic datamodel using Datastax java driver Hello Varsha Your best bet is to go with blob type by serializing all data into bytes. Another alternative is to use text and serialize to JSON. For the dynamic columns, use clustering columns in CQL3 with blob/text type Regards Duy Hai DOAN On Wed, Apr 2, 2014 at 11:21 AM, Raveendran, Varsha IN BLR STS varsha.raveend...@siemens.commailto:varsha.raveend...@siemens.com wrote: Hello, I am building a write client in java to insert records into Cassandra 2.0.5. I am using the Datastax java driver. Problem : The datamodel is dynamic. By dynamic, I mean that the number of columns and the datatype of columns will be given as an input by the user. It has only 1 keyspace and 1 column family. For inserting records bound statements seems the way to go. But the bind() function accepts only a sequence of Objects ( column values) . How do I bind the values when the number and datatype of columns is given as input? Any suggestions? Thanks Regards, Varsha
Re: Drop in node replacements.
Cassandra 1.2.15, using commodity hardware. On Tue, Apr 1, 2014 at 6:37 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Apr 1, 2014 at 3:24 PM, Redmumba redmu...@gmail.com wrote: Is it possible to have true drop in node replacements? For example, I have a cluster of 51 Cassandra nodes, 17 in each data center. I had one host go down on DC3, and when it came back up, it joined the ring, etc., but was not receiving any data. Even after multiple restarts and forcing a repair on the entire fleet, it still holds maybe ~30MB on a cluster that is absorbing ~1.2TB a day. What version of Cassandra? Real hardware/network or virtual? =Rob
Re: Exporting column family data to csv
Thanks for the reply. Most of the solutions provided over web involves some kind of 'where' clause in data extract and then export the next set until done. I have column family with no time stamp and no other column I can use to filter the data. One other solution provided was to use pagination, but I could not find any example any where over web that achieves this. This can not be that hard! I must be missing something. Please advise. On Wednesday, April 2, 2014, Viktor Jevdokimov viktor.jevdoki...@adform.com wrote: http://mail-archives.apache.org/mod_mbox/cassandra-user/201309.mbox/%3C9AF3ADEDDFED4DDEA840B8F5C6286BBA@vig.local%3E http://stackoverflow.com/questions/18872422/rpc-timeout-error-while-exporting-data-from-cql Google for more. Best regards / Pagarbiai *Viktor Jevdokimov* Senior Developer Email: viktor.jevdoki...@adform.comjavascript:_e(%7B%7D,'cvml','viktor.jevdoki...@adform.com'); Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-03163 Vilnius, Lithuania Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider Experience Adform DNA http://vimeo.com/76421547 [image: Adform News] http://www.adform.com [image: Adform awarded the Best Employer 2012] http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/ Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. *From:* ng [mailto:pipeli...@gmail.comjavascript:_e(%7B%7D,'cvml','pipeli...@gmail.com');] *Sent:* Wednesday, April 2, 2014 6:04 PM *To:* user@cassandra.apache.orgjavascript:_e(%7B%7D,'cvml','user@cassandra.apache.org'); *Subject:* Exporting column family data to csv I want to export all the data of particular column family to the text file from Cassandra cluster. I tried copy keyspace.mycolumnfamily to '/root/ddd/xx.csv'; It gave me timeout error I tried below in Cassandra.yaml request_timeout_in_ms: 1000 read_request_timeout_in_ms: 1000 range_request_timeout_in_ms: 1000 truncate_request_timeout_in_ms: 1000 I still have no luck. Any advise how to achieve this? I am NOT limited to copy command. What is the best way to achieve this? Thanks in advance for the help. ng inline: signature-best-employer-logo4823.pnginline: signature-logo29.png