Re: Performance Difference between Batch Insert and Bulk Load

2014-12-05 Thread Philip Thompson
Splitting the batches by partition key and inserting them with a TokenAware policy is already possible with existing driver code, though you will have to split the batches yourself. On Fri, Dec 5, 2014 at 3:12 PM, Dong Dai wrote: > Err, am i misunderstanding something? > I thought Tyler is going

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-05 Thread Dong Dai
Err, am i misunderstanding something? I thought Tyler is going to add some codes to split unlogged batch and make the batch insertion token aware. it is already done? or else i can do it too. thanks, - Dong > On Dec 5, 2014, at 2:06 PM, Philip Thompson > wrote: > > What progress are you try

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-05 Thread Philip Thompson
What progress are you trying to be aware of? All of the features Tyler discussed are implemented and can be used. On Fri, Dec 5, 2014 at 2:41 PM, Dong Dai wrote: > > On Dec 5, 2014, at 11:23 AM, Tyler Hobbs wrote: > > > On Fri, Dec 5, 2014 at 1:15 AM, Dong Dai wrote: > >> Sounds great! By the

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-05 Thread Dong Dai
> On Dec 5, 2014, at 11:23 AM, Tyler Hobbs wrote: > > > On Fri, Dec 5, 2014 at 1:15 AM, Dong Dai > wrote: > Sounds great! By the way, will you create a ticket for this, so we can follow > the updates? > > What would the ticket be for? (I might have missed somethi

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-05 Thread Tyler Hobbs
On Fri, Dec 5, 2014 at 1:15 AM, Dong Dai wrote: > Sounds great! By the way, will you create a ticket for this, so we can > follow the updates? What would the ticket be for? (I might have missed something in the conversation.) -- Tyler Hobbs DataStax

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-04 Thread Dong Dai
> On Dec 4, 2014, at 1:46 PM, Tyler Hobbs wrote: > > > On Thu, Dec 4, 2014 at 11:50 AM, Dong Dai > wrote: > As we already did what coordinators do in client side, why don’t we do one > step more: > break the UNLOGGED batch statements into several small batch statem

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-04 Thread Shane Hansen
I'd be really interested to know what sort of performance or load improvements you see by doing client side partitioning. Please post back some results if you've tried that strategy. On Thu, Dec 4, 2014 at 11:46 AM, Tyler Hobbs wrote: > > On Thu, Dec 4, 2014 at 11:50 AM, Dong Dai wrote: > >> As

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-04 Thread Tyler Hobbs
On Thu, Dec 4, 2014 at 11:50 AM, Dong Dai wrote: > As we already did what coordinators do in client side, why don’t we do one > step more: > break the UNLOGGED batch statements into several small batch statements, > each of which contains > the statements with the same partition key. And send the

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-04 Thread Dong Dai
> On Dec 4, 2014, at 11:37 AM, Tyler Hobbs wrote: > > > On Wed, Dec 3, 2014 at 11:02 PM, Dong Dai > wrote: > > 1) except I am using TokenAwarePolicy, the async insert also can not be sent > to > the right coordinator. > > Yes. Of course, TokenAwarePolicy can w

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-04 Thread Tyler Hobbs
On Wed, Dec 3, 2014 at 11:02 PM, Dong Dai wrote: > > 1) except I am using TokenAwarePolicy, the async insert also can not be > sent to > the right coordinator. > Yes. Of course, TokenAwarePolicy can wrap any other policy. > > 2) the TokenAwarePolicy actually is doing the job that coordinators

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-03 Thread Dong Dai
Thanks a lot for the great answers. P.S. I move this thread here from dev. By checking the source code of java-driver, i noticed that the execute() method is implemented using executeAsync() with an immediate get(): @Override public ResultSet execute(Statement statement) { return ex

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-01 Thread Robert Coli
On Mon, Dec 1, 2014 at 12:10 PM, Dong Dai wrote: > I guess you mean that BulkLoader is done by streaming whole SSTable to > remote servers, so it is faster? > Well, it's not exactly "whole SSTable" but yes, that's the sort of statement I'm making. [1] > The documentation says that all the rows

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-01 Thread Dong Dai
Thanks Rob, I guess you mean that BulkLoader is done by streaming whole SSTable to remote servers, so it is faster? The documentation says that all the rows in the SSTable will be inserted into the new cluster conforming to the replication strategy of that cluster. This gives me a felling tha

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-01 Thread Robert Coli
On Sun, Nov 30, 2014 at 8:44 PM, Dong Dai wrote: > The question is can I expect a better performance using the BulkLoader > this way comparing with using Batch insert? > You just asked if writing once (via streaming) is likely to be significantly more efficient than writing twice (once to the co

Performance Difference between Batch Insert and Bulk Load

2014-11-30 Thread Dong Dai
Hi, all, I have a performance question about the batch insert and bulk load. According to the documents, to import large volume of data into Cassandra, Batch Insert and Bulk Load can both be an option. Using batch insert is pretty straightforwards, but there have not been an ‘official’ way to