Re: Problems in the cassandra bulk loader

2013-10-15 Thread José Elias Queiroga da Costa Araújo
Hi Viktor,

thanks for your explanation.

[]'s

Elias.


2013/10/10 Viktor Jevdokimov viktor.jevdoki...@adform.com

  SSTableSimpleUnsortedWriter is a sstable writer, not Cassandra, so it
 just writes to file what you give as it is, you need to ensure the
 consistency.

 ** **

 You can check the file before running sstableloader, all the data is
 within sstable, but instead of 1 row it will have 10 rows with the same
 key. Probably the same will arrive to Cassandra upon import.

 ** **

 But when Cassandra reads sstable sequentially when searches for the key,
 the only first row will be returned (with first column), since it is found
 and no reason to scan more, it will not return many rows with the same key,
 because Cassandra does not expect more rows with the same key in sstable.*
 ***

 ** **

 ** **
Best regards / Pagarbiai
 *Viktor Jevdokimov*
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-03163 Vilnius, Lithuania
 Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider
 Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia
  [image: Adform News] http://www.adform.com
 [image: Adform awarded the Best Employer 2012]
 http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

 *From:* José Elias Queiroga da Costa Araújo [mailto:je...@cesar.org.br]
 *Sent:* Thursday, October 10, 2013 4:33 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Problems in the cassandra bulk loader

 ** **

 ** **

 Hi, I thought the bulk API could handle this, merging all columns
 for the same super column. I did something like this in the java client
 (Hector) where it is able to solve this conflict only appending the columns.
 

 ** **

 Regarding to the column value, if the code is overwriting the
 columns I expected the column had the last value of my collection, but it
 is considering the first one. 

 ** **

 Regards,

 ** **

 Elias.

 ** **

 2013/10/10 Viktor Jevdokimov viktor.jevdoki...@adform.com

 You overwrite your columns by writing new row/supercolumn.

 Remove new row/supercolumn from for statement, which is for columns:



 int rowKey = 10;
 int superColumnKey = 20;

 usersWriter.newRow(ByteBufferUtil.bytes(rowKey));
 usersWriter.newSuperColumn(ByteBufferUtil.bytes(superColumnKey));

 for (int i = 0; i  10; i++) {

 usersWriter.addColumn(
 ByteBufferUtil.bytes(i+1),
 ByteBufferUtil.bytes(i+1),
 System.currentTimeMillis());
  }
  usersWriter.close();

 Next time ask such questions in user mail list, not C* devs, which is for
 C* development, not usage/your code development around Cassandra.





 Best regards / Pagarbiai

 Viktor Jevdokimov
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063
 Fax: +370 5 261 0453

 J. Jasinskio 16C,
 LT-03163 Vilnius,
 Lithuania



 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.-Original Message-
 From: José Elias Queiroga da Costa Araújo [mailto:je...@cesar.org.br]
 Sent: Wednesday, October 9, 2013 11:22 PM
 To: dev
 Subject: Problems in the cassandra bulk loader


 Hi all,

 I'm trying to use the bulk insertion with the
 SSTableSimpleUnsortedWriter class from cassandra API and I facing some
 problems.  After generating and uploading the .db files by using the
 ./sstableloader command , I noticed the data didn't match with inserted one.

 I put the used code below to try to explain the bahaviour.

  I'm trying to generate the data files using only one rowkey and
 one supercolumn. Where the super column has 10 columns.

 IPartitioner p = new Murmur3Partitioner(); CFMetaData scf = new
 CFMetaData(myKeySpace, Column,  ColumnFamilyType.Super,
 BytesType.instance, BytesType.instance);

 SSTableSimpleUnsortedWriter usersWriter = new
 SSTableSimpleUnsortedWriter(new File

RE: Problems in the cassandra bulk loader

2013-10-10 Thread Viktor Jevdokimov
You overwrite your columns by writing new row/supercolumn.

Remove new row/supercolumn from for statement, which is for columns:


int rowKey = 10;
int superColumnKey = 20;
usersWriter.newRow(ByteBufferUtil.bytes(rowKey));
usersWriter.newSuperColumn(ByteBufferUtil.bytes(superColumnKey));
for (int i = 0; i  10; i++) {
usersWriter.addColumn(
ByteBufferUtil.bytes(i+1),
ByteBufferUtil.bytes(i+1),
System.currentTimeMillis());
 }
 usersWriter.close();

Next time ask such questions in user mail list, not C* devs, which is for C* 
development, not usage/your code development around Cassandra.





Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-03163 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.-Original Message-
From: José Elias Queiroga da Costa Araújo [mailto:je...@cesar.org.br]
Sent: Wednesday, October 9, 2013 11:22 PM
To: dev
Subject: Problems in the cassandra bulk loader

Hi all,

I'm trying to use the bulk insertion with the 
SSTableSimpleUnsortedWriter class from cassandra API and I facing some 
problems.  After generating and uploading the .db files by using the 
./sstableloader command , I noticed the data didn't match with inserted one.

I put the used code below to try to explain the bahaviour.

 I'm trying to generate the data files using only one rowkey and one 
supercolumn. Where the super column has 10 columns.

IPartitioner p = new Murmur3Partitioner(); CFMetaData scf = new 
CFMetaData(myKeySpace, Column,  ColumnFamilyType.Super, BytesType.instance, 
BytesType.instance);

SSTableSimpleUnsortedWriter usersWriter = new SSTableSimpleUnsortedWriter(new 
File(./), scf, p,64);

int rowKey = 10;
int superColumnKey = 20;
for (int i = 0; i  10; i++) {
 usersWriter.newRow(ByteBufferUtil.bytes(rowKey));
usersWriter.newSuperColumn(ByteBufferUtil.bytes(superColumnKey));
 usersWriter.addColumn(ByteBufferUtil.bytes(i+1),ByteBufferUtil.bytes(i+1),
System.currentTimeMillis());
 }
 usersWriter.close();

After uploading,  the result is:

RowKey: 000a
   = (super_column=0014,
  (name=0001, value=0001,
timestamp=1381348293144))

1 Row Returned.

In this case, my super column should have 10 columns? With 
values between 0001 to 0011?  Since I'm using the same super column.  
The documentation says the newRow method could be invoked many times, it 
impacts only the performance.

The second question is: If this is the correct behavior, the 
column value should be 0011, since it is the last value passed as argument 
to addColumn(...) method in the loop?

  Thanks in the advance,

   Elias.


Re: Problems in the cassandra bulk loader

2013-10-10 Thread José Elias Queiroga da Costa Araújo
Hi, I thought the bulk API could handle this, merging all columns
for the same super column. I did something like this in the java client
(Hector) where it is able to solve this conflict only appending the columns.

Regarding to the column value, if the code is overwriting the
columns I expected the column had the last value of my collection, but it
is considering the first one.

Regards,

Elias.


2013/10/10 Viktor Jevdokimov viktor.jevdoki...@adform.com

 You overwrite your columns by writing new row/supercolumn.

 Remove new row/supercolumn from for statement, which is for columns:


 int rowKey = 10;
 int superColumnKey = 20;
 usersWriter.newRow(ByteBufferUtil.bytes(rowKey));
 usersWriter.newSuperColumn(ByteBufferUtil.bytes(superColumnKey));
 for (int i = 0; i  10; i++) {
 usersWriter.addColumn(
 ByteBufferUtil.bytes(i+1),
 ByteBufferUtil.bytes(i+1),
 System.currentTimeMillis());
  }
  usersWriter.close();

 Next time ask such questions in user mail list, not C* devs, which is for
 C* development, not usage/your code development around Cassandra.





 Best regards / Pagarbiai

 Viktor Jevdokimov
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063
 Fax: +370 5 261 0453

 J. Jasinskio 16C,
 LT-03163 Vilnius,
 Lithuania



 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.-Original Message-
 From: José Elias Queiroga da Costa Araújo [mailto:je...@cesar.org.br]
 Sent: Wednesday, October 9, 2013 11:22 PM
 To: dev
 Subject: Problems in the cassandra bulk loader

 Hi all,

 I'm trying to use the bulk insertion with the
 SSTableSimpleUnsortedWriter class from cassandra API and I facing some
 problems.  After generating and uploading the .db files by using the
 ./sstableloader command , I noticed the data didn't match with inserted one.

 I put the used code below to try to explain the bahaviour.

  I'm trying to generate the data files using only one rowkey and
 one supercolumn. Where the super column has 10 columns.

 IPartitioner p = new Murmur3Partitioner(); CFMetaData scf = new
 CFMetaData(myKeySpace, Column,  ColumnFamilyType.Super,
 BytesType.instance, BytesType.instance);

 SSTableSimpleUnsortedWriter usersWriter = new
 SSTableSimpleUnsortedWriter(new File(./), scf, p,64);

 int rowKey = 10;
 int superColumnKey = 20;
 for (int i = 0; i  10; i++) {
  usersWriter.newRow(ByteBufferUtil.bytes(rowKey));
 usersWriter.newSuperColumn(ByteBufferUtil.bytes(superColumnKey));
  usersWriter.addColumn(ByteBufferUtil.bytes(i+1),ByteBufferUtil.bytes(i+1),
 System.currentTimeMillis());
  }
  usersWriter.close();

 After uploading,  the result is:

 RowKey: 000a
= (super_column=0014,
   (name=0001, value=0001,
 timestamp=1381348293144))

 1 Row Returned.

 In this case, my super column should have 10 columns? With
 values between 0001 to 0011?  Since I'm using the same super
 column.  The documentation says the newRow method could be invoked many
 times, it impacts only the performance.

 The second question is: If this is the correct behavior,
 the column value should be 0011, since it is the last value passed as
 argument to addColumn(...) method in the loop?

   Thanks in the advance,

Elias.



RE: Problems in the cassandra bulk loader

2013-10-10 Thread Viktor Jevdokimov
SSTableSimpleUnsortedWriter is a sstable writer, not Cassandra, so it just 
writes to file what you give as it is, you need to ensure the consistency.

You can check the file before running sstableloader, all the data is within 
sstable, but instead of 1 row it will have 10 rows with the same key. Probably 
the same will arrive to Cassandra upon import.

But when Cassandra reads sstable sequentially when searches for the key, the 
only first row will be returned (with first column), since it is found and no 
reason to scan more, it will not return many rows with the same key, because 
Cassandra does not expect more rows with the same key in sstable.


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-03163 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News] http://www.adform.com
[Adform awarded the Best Employer 2012] 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: José Elias Queiroga da Costa Araújo [mailto:je...@cesar.org.br]
Sent: Thursday, October 10, 2013 4:33 PM
To: user@cassandra.apache.org
Subject: Re: Problems in the cassandra bulk loader


Hi, I thought the bulk API could handle this, merging all columns for 
the same super column. I did something like this in the java client (Hector) 
where it is able to solve this conflict only appending the columns.

Regarding to the column value, if the code is overwriting the columns I 
expected the column had the last value of my collection, but it is considering 
the first one.

Regards,

Elias.

2013/10/10 Viktor Jevdokimov 
viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
You overwrite your columns by writing new row/supercolumn.

Remove new row/supercolumn from for statement, which is for columns:


int rowKey = 10;
int superColumnKey = 20;
usersWriter.newRow(ByteBufferUtil.bytes(rowKey));
usersWriter.newSuperColumn(ByteBufferUtil.bytes(superColumnKey));
for (int i = 0; i  10; i++) {
usersWriter.addColumn(
ByteBufferUtil.bytes(i+1),
ByteBufferUtil.bytes(i+1),
System.currentTimeMillis());
 }
 usersWriter.close();

Next time ask such questions in user mail list, not C* devs, which is for C* 
development, not usage/your code development around Cassandra.





Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-03163 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.-Original Message-
From: José Elias Queiroga da Costa Araújo 
[mailto:je...@cesar.org.brmailto:je...@cesar.org.br]
Sent: Wednesday, October 9, 2013 11:22 PM
To: dev
Subject: Problems in the cassandra bulk loader

Hi all,

I'm trying to use the bulk insertion with the 
SSTableSimpleUnsortedWriter class from cassandra API and I facing some 
problems.  After generating and uploading the .db files by using the 
./sstableloader command , I noticed the data didn't match with inserted one.

I put the used code below to try to explain the bahaviour.

 I'm trying to generate the data files using only one rowkey and one 
supercolumn. Where the super column has 10 columns.

IPartitioner p = new Murmur3Partitioner(); CFMetaData scf = new 
CFMetaData(myKeySpace, Column,  ColumnFamilyType.Super, BytesType.instance, 
BytesType.instance);

SSTableSimpleUnsortedWriter usersWriter = new SSTableSimpleUnsortedWriter(new 
File(./), scf, p,64);

int rowKey = 10;
int superColumnKey = 20;
for (int i = 0; i  10; i++) {
 usersWriter.newRow(ByteBufferUtil.bytes(rowKey));
usersWriter.newSuperColumn(ByteBufferUtil.bytes(superColumnKey));
 usersWriter.addColumn(ByteBufferUtil.bytes(i