Re: Read performance in map data type

2014-04-02 Thread Apoorva Gaurav
I've observed that reducing fetch size results in better latency (isn't
that obvious :-)), tried from fetch size varying from 100 to 1, seeing
a lot of errors for 1. Haven't tried modifying the number of columns.

Let me start a new thread focused on fetch size.


On Wed, Apr 2, 2014 at 9:53 AM, Sourabh Agrawal iitr.sour...@gmail.comwrote:

 From the doc : The fetch size controls how much resulting rows will be
 retrieved simultaneously.
 So, I guess it does not depend on the number of columns as such. As all
 the columns for a key reside on the same node, I think it wouldn't matter
 much whatever be the number of columns as long as we have enough memory in
 the app.

 Default value is 5000. (com.datastax.driver.core.QueryOptions)

 We use it with the default value. I have never profiled cassandra for read
 load. If you profile it for different fetch sizes, please share the results
 :)


 On Wed, Apr 2, 2014 at 8:45 AM, Apoorva Gaurav 
 apoorva.gau...@myntra.comwrote:

 Thanks Sourabh,

 I've modelled my table as studentID int, subjectID int, marks int,
 PRIMARY KEY(studentID, subjectID) as primarily I'll be querying using
 studentID and sometime using studentID and subjectID.

 I've tried driver 2.0.0 and its giving good results. Also using its auto
 paging feature. Any idea what should be a typical value for fetch size. And
 does the fetch size depends on how many columns are there in the CQL table
 for e.g. should fetch size in a table like studentID int, subjectID
 int, marks1 int, marks2 int, marks3 int marksN int PRIMARY
 KEY(studentID, subjectID) be less than fetch size in studentID int,
 subjectID int, marks int, PRIMARY KEY(studentID, subjectID)


 On Wed, Apr 2, 2014 at 2:20 AM, Robert Coli rc...@eventbrite.com wrote:

  On Mon, Mar 31, 2014 at 9:13 PM, Apoorva Gaurav 
 apoorva.gau...@myntra.com wrote:

 Thanks Robert, Is there a workaround, as in our test setups we keep
 dropping and recreating tables.


 Use unique keyspace (or table) names for each test? That's the approach
 they're taking in 5202...

 =Rob




 --
 Thanks  Regards,
 Apoorva




 --
 Sourabh Agrawal
 Bangalore
 +91 9945657973




-- 
Thanks  Regards,
Apoorva


optimum fetch size in datastax driver

2014-04-02 Thread Apoorva Gaurav
Hello All,

We have a schema which can be modelled as *(studentID int, subjectID int,
marks int, PRIMARY KEY(studentID, subjectID)*. There can be ~1M studentIDs
and for each studentID there can be ~10K subjectIDs. The queries can be
using studentID and studentID-subjectID We have a 3 node (each having 24
cores) apache cassandra 2.0.4 cluster and are using datastax driver 2.0.0
to interact with it using its automatic paging feature. I've tried
various fetch
sizes varying from 100 to 10K and observed that read latency increases with
fetch size (which looks obvious). At around 10K there are a lot of errors.
Want to understand :-

   - Is there a rule of thumb for deciding on the optimum fetch size (
   *com.datastax.driver.core.Statement.setFetchSize()* ).
   - Does cassandra keeps the entire result in cache and only returns the
   rows corresponding to the fetch size or it treats subsequent as new queries
   ( *com.datastax.driver.core.**ResultSet.fetchMoreResults() *)
   - Whether the optimum fetch size depends on number of columns in CQL
   table for e.g. should fetch size in a table like ***studentID int,
   subjectID int, marks1 int, marks2 int, marks3 int marksN int PRIMARY
   KEY(studentID, subjectID)* be less than fetch size in *studentID int,
   subjectID int, marks int, PRIMARY KEY(studentID, subjectID)*


-- 
Thanks  Regards,
Apoorva


Inserts with a dynamic datamodel using Datastax java driver

2014-04-02 Thread Raveendran, Varsha IN BLR STS
Hello,

I am building a write client in java to insert records into  Cassandra 2.0.5.  
I am using the Datastax java driver.

Problem :  The datamodel is dynamic. By dynamic, I mean that the number of 
columns and the datatype of columns will be given as an input by the user.  It 
has only 1 keyspace and 1 column family.

For inserting records bound statements seems the way to go.  But the bind() 
function accepts only a sequence of Objects  ( column values) .
How do I bind the values when the number and datatype of columns is given as 
input? Any suggestions?

 Thanks  Regards,
Varsha




Re: Inserts with a dynamic datamodel using Datastax java driver

2014-04-02 Thread DuyHai Doan
Hello Varsha

 Your best bet is to go with blob type by serializing all data into bytes.
Another alternative is to use text and serialize to JSON.

 For the dynamic columns, use clustering columns in CQL3 with blob/text type

 Regards

 Duy Hai DOAN


On Wed, Apr 2, 2014 at 11:21 AM, Raveendran, Varsha IN BLR STS 
varsha.raveend...@siemens.com wrote:

  Hello,

 I am building a write client in java to insert records into  Cassandra
 2.0.5.  I am using the Datastax java driver.

 *Problem** : * The datamodel is dynamic. By dynamic, I mean that the
 number of columns and the datatype of columns will be given as an input by
 the user.  It has only 1 keyspace and 1 column family.

 For inserting records bound statements seems the way to go.  But the
 bind() function accepts only a sequence of Objects  ( column values) .
 How do I bind the values when the number and datatype of columns is given
 as input? Any suggestions?

  Thanks  Regards,
 Varsha





Exporting column family data to csv

2014-04-02 Thread ng
I want to export all the data of particular column family to the text file
from Cassandra cluster.

I tried

copy keyspace.mycolumnfamily to '/root/ddd/xx.csv';

It gave me timeout error

I tried below in Cassandra.yaml

request_timeout_in_ms: 1000
read_request_timeout_in_ms: 1000
range_request_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 1000

I still have no luck. Any advise how to achieve this? I am NOT limited to
copy command.  What is the best way to achieve this? Thanks in advance for
the help.
ng


RE: Exporting column family data to csv

2014-04-02 Thread Viktor Jevdokimov
http://mail-archives.apache.org/mod_mbox/cassandra-user/201309.mbox/%3C9AF3ADEDDFED4DDEA840B8F5C6286BBA@vig.local%3E

http://stackoverflow.com/questions/18872422/rpc-timeout-error-while-exporting-data-from-cql

Google for more.


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-03163 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Experience Adform DNAhttp://vimeo.com/76421547

[Adform News] http://www.adform.com
[Adform awarded the Best Employer 2012] 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: ng [mailto:pipeli...@gmail.com]
Sent: Wednesday, April 2, 2014 6:04 PM
To: user@cassandra.apache.org
Subject: Exporting column family data to csv


I want to export all the data of particular column family to the text file from 
Cassandra cluster.

I tried

copy keyspace.mycolumnfamily to '/root/ddd/xx.csv';

It gave me timeout error

I tried below in Cassandra.yaml

request_timeout_in_ms: 1000
read_request_timeout_in_ms: 1000
range_request_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 1000

I still have no luck. Any advise how to achieve this? I am NOT limited to copy 
command.  What is the best way to achieve this? Thanks in advance for the help.
ng
inline: signature-logo29.pnginline: signature-best-employer-logo4823.png

RE: Inserts with a dynamic datamodel using Datastax java driver

2014-04-02 Thread Raveendran, Varsha IN BLR STS
Hi,

Thanks for replying.

I dint quite get what you meant by use clustering columns in CQL3 with 
blob/text type.

I have elaborated my problem statement below.
Assume the schema of the keyspace to which random records need to be inserted 
is given in the following format :
KeySpace Name :   KS_1
ColumnFamilyName : CF_1
Columns: [Column1 : uuid , Column2: varint, Column3: timestamp,  ... 
ColumnN:text]


So I parse this file to get the schema.  Also, the data/value for each column 
should be generated randomly depending on the datatype of the column.
My question is how do I insert the records ?


1.  I created a prepared statement depending on the number of columns 
(using a for loop).  Then for each record I called methods like setDate() or 
setVarint()  to bind the values.

But this was taking too much time because data was being generated for each 
column , then set in the prepared statement  and then inserted.  And the number 
of records = 1 billion!!



2.  The executeAsync () function seemed more likely to be faster. But the 
problem is that the bind() function takes a sequence of values.  Since the 
number of columns is variable I am not able to make this code generic (i.e to 
cater to any schema given by the user) .



I am not sure if there is another way to approach this problem.


Thanks  Regards,
Varsha


From: DuyHai Doan [mailto:doanduy...@gmail.com]
Sent: Wednesday, April 02, 2014 4:05 PM
To: user@cassandra.apache.org
Subject: Re: Inserts with a dynamic datamodel using Datastax java driver

Hello Varsha

 Your best bet is to go with blob type by serializing all data into bytes. 
Another alternative is to use text and serialize to JSON.

 For the dynamic columns, use clustering columns in CQL3 with blob/text type

 Regards

 Duy Hai DOAN

On Wed, Apr 2, 2014 at 11:21 AM, Raveendran, Varsha IN BLR STS 
varsha.raveend...@siemens.commailto:varsha.raveend...@siemens.com wrote:
Hello,

I am building a write client in java to insert records into  Cassandra 2.0.5.  
I am using the Datastax java driver.

Problem : The datamodel is dynamic. By dynamic, I mean that the number of 
columns and the datatype of columns will be given as an input by the user.  It 
has only 1 keyspace and 1 column family.

For inserting records bound statements seems the way to go.  But the bind() 
function accepts only a sequence of Objects  ( column values) .
How do I bind the values when the number and datatype of columns is given as 
input? Any suggestions?

Thanks  Regards,
Varsha





Re: Drop in node replacements.

2014-04-02 Thread Redmumba
Cassandra 1.2.15, using commodity hardware.


On Tue, Apr 1, 2014 at 6:37 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Apr 1, 2014 at 3:24 PM, Redmumba redmu...@gmail.com wrote:

 Is it possible to have true drop in node replacements?  For example, I
 have a cluster of 51 Cassandra nodes, 17 in each data center.  I had one
 host go down on DC3, and when it came back up, it joined the ring, etc.,
 but was not receiving any data.  Even after multiple restarts and forcing a
 repair on the entire fleet, it still holds maybe ~30MB on a cluster that is
 absorbing ~1.2TB a day.


 What version of Cassandra? Real hardware/network or virtual?

 =Rob




Re: Exporting column family data to csv

2014-04-02 Thread ng
Thanks for the reply.  Most of the solutions provided over web involves
some kind of 'where' clause in data extract and then export the next set
until done. I have column family with no time stamp and no other column I
can use to filter the data. One other solution provided was to use
pagination, but I could not find any example any where over web that
achieves this. This can not be that hard! I must be missing something.
Please advise.

On Wednesday, April 2, 2014, Viktor Jevdokimov viktor.jevdoki...@adform.com
wrote:


 http://mail-archives.apache.org/mod_mbox/cassandra-user/201309.mbox/%3C9AF3ADEDDFED4DDEA840B8F5C6286BBA@vig.local%3E




 http://stackoverflow.com/questions/18872422/rpc-timeout-error-while-exporting-data-from-cql



 Google for more.




Best regards / Pagarbiai
 *Viktor Jevdokimov*
 Senior Developer

 Email: 
 viktor.jevdoki...@adform.comjavascript:_e(%7B%7D,'cvml','viktor.jevdoki...@adform.com');
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-03163 Vilnius, Lithuania
 Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider
 Experience Adform DNA http://vimeo.com/76421547
  [image: Adform News] http://www.adform.com
 [image: Adform awarded the Best Employer 2012]
 http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

 *From:* ng 
 [mailto:pipeli...@gmail.comjavascript:_e(%7B%7D,'cvml','pipeli...@gmail.com');]

 *Sent:* Wednesday, April 2, 2014 6:04 PM
 *To:* 
 user@cassandra.apache.orgjavascript:_e(%7B%7D,'cvml','user@cassandra.apache.org');
 *Subject:* Exporting column family data to csv




 I want to export all the data of particular column family to the text file
 from Cassandra cluster.

 I tried

 copy keyspace.mycolumnfamily to '/root/ddd/xx.csv';

 It gave me timeout error

 I tried below in Cassandra.yaml

 request_timeout_in_ms: 1000
 read_request_timeout_in_ms: 1000
 range_request_timeout_in_ms: 1000
 truncate_request_timeout_in_ms: 1000

 I still have no luck. Any advise how to achieve this? I am NOT limited to
 copy command.  What is the best way to achieve this? Thanks in advance for
 the help.

 ng

inline: signature-best-employer-logo4823.pnginline: signature-logo29.png