Re: Problems in the cassandra bulk loader

José Elias Queiroga da Costa Araújo Tue, 15 Oct 2013 05:34:11 -0700

        Hi Viktor,

        thanks for your explanation.


        []'s

        Elias.


2013/10/10 Viktor Jevdokimov <viktor.jevdoki...@adform.com>

>  SSTableSimpleUnsortedWriter is a sstable writer, not Cassandra, so it
> just writes to file what you give as it is, you need to ensure the
> consistency.****
>
> ** **
>
> You can check the file before running sstableloader, all the data is
> within sstable, but instead of 1 row it will have 10 rows with the same
> key. Probably the same will arrive to Cassandra upon import.****
>
> ** **
>
> But when Cassandra reads sstable sequentially when searches for the key,
> the only first row will be returned (with first column), since it is found
> and no reason to scan more, it will not return many rows with the same key,
> because Cassandra does not expect more rows with the same key in sstable.*
> ***
>
> ** **
>
> ** **
>    Best regards / Pagarbiai
> *Viktor Jevdokimov*
> Senior Developer
>
> Email: viktor.jevdoki...@adform.com
> Phone: +370 5 212 3063, Fax +370 5 261 0453
> J. Jasinskio 16C, LT-03163 Vilnius, Lithuania
> Follow us on Twitter: @adforminsider <http://twitter.com/#!/adforminsider>
> Take a ride with Adform's Rich Media Suite<http://vimeo.com/adform/richmedia>
>  [image: Adform News] <http://www.adform.com>
> [image: Adform awarded the Best Employer 2012]
> <http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
> *From:* José Elias Queiroga da Costa Araújo [mailto:je...@cesar.org.br]
> *Sent:* Thursday, October 10, 2013 4:33 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Problems in the cassandra bulk loader****
>
> ** **
>
> ** **
>
>         Hi, I thought the bulk API could handle this, merging all columns
> for the same super column. I did something like this in the java client
> (Hector) where it is able to solve this conflict only appending the columns.
> ****
>
> ** **
>
>         Regarding to the column value, if the code is overwriting the
> columns I expected the column had the last value of my collection, but it
> is considering the first one. ****
>
> ** **
>
>         Regards,****
>
> ** **
>
>         Elias.****
>
> ** **
>
> 2013/10/10 Viktor Jevdokimov <viktor.jevdoki...@adform.com>****
>
> You overwrite your columns by writing new row/supercolumn.
>
> Remove new row/supercolumn from "for" statement, which is for columns:****
>
>
>
> int rowKey = 10;
> int superColumnKey = 20;****
>
> usersWriter.newRow(ByteBufferUtil.bytes(rowKey));
> usersWriter.newSuperColumn(ByteBufferUtil.bytes(superColumnKey));****
>
> for (int i = 0; i < 10; i++) {****
>
>         usersWriter.addColumn(
>                 ByteBufferUtil.bytes(i+1),
>                 ByteBufferUtil.bytes(i+1),
>                 System.currentTimeMillis());
>  }
>  usersWriter.close();
>
> Next time ask such questions in user mail list, not C* devs, which is for
> C* development, not usage/your code development around Cassandra.
>
>
>
>
>
> Best regards / Pagarbiai
>
> Viktor Jevdokimov
> Senior Developer
>
> Email: viktor.jevdoki...@adform.com
> Phone: +370 5 212 3063
> Fax: +370 5 261 0453
>
> J. Jasinskio 16C,
> LT-03163 Vilnius,
> Lithuania
>
>
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.-----Original Message-----
> From: José Elias Queiroga da Costa Araújo [mailto:je...@cesar.org.br]
> Sent: Wednesday, October 9, 2013 11:22 PM
> To: dev
> Subject: Problems in the cassandra bulk loader****
>
>
>         Hi all,
>
>         I'm trying to use the bulk insertion with the
> SSTableSimpleUnsortedWriter class from cassandra API and I facing some
> problems.  After generating and uploading the .db files by using the
> ./sstableloader command , I noticed the data didn't match with inserted one.
>
>         I put the used code below to try to explain the bahaviour.
>
>          I'm trying to generate the data files using only one rowkey and
> one supercolumn. Where the super column has 10 columns.
>
> IPartitioner p = new Murmur3Partitioner(); CFMetaData scf = new
> CFMetaData("myKeySpace", "Column",  ColumnFamilyType.Super,
> BytesType.instance, BytesType.instance);
>
> SSTableSimpleUnsortedWriter usersWriter = new
> SSTableSimpleUnsortedWriter(new File("./"), scf, p,64);
>
> int rowKey = 10;
> int superColumnKey = 20;
> for (int i = 0; i < 10; i++) {
>  usersWriter.newRow(ByteBufferUtil.bytes(rowKey));
> usersWriter.newSuperColumn(ByteBufferUtil.bytes(superColumnKey));
>  usersWriter.addColumn(ByteBufferUtil.bytes(i+1),ByteBufferUtil.bytes(i+1),
> System.currentTimeMillis());
>  }
>  usersWriter.close();
>
>                 After uploading,  the result is:
>
>                 RowKey: 0000000a
>                    => (super_column=00000014,
>                               (name=00000001, value=00000001,
> timestamp=1381348293144))
>
>                 1 Row Returned.
>
>                 In this case, my super column should have 10 columns? With
> values between 00000001 to 00000011?  Since I'm using the same super
> column.  The documentation says the newRow method could be invoked many
> times, it impacts only the performance.
>
>                 The second question is: If this is the correct behavior,
> the column value should be 00000011, since it is the last value passed as
> argument to addColumn(...) method in the loop?
>
>               Thanks in the advance,
>
>                Elias.****
>
>  ** **
>

<<signature-best-employer-logo4823.png>>

<<signature-logo29.png>>

Re: Problems in the cassandra bulk loader

Reply via email to