Re: map/reduce on Cassandra

2010-01-25 Thread Vijay
+1

Regards,





On Mon, Jan 25, 2010 at 10:47 AM, Jeff Hodges  wrote:

> 1) Works with RandomPartitioner. This is huge and the only way almost
> everyone would able to use it.
>
2) Ability to divide up the keys of a single node to more than one
> mapper. The prototype just slurped up everything on the node. This
> would probably be easiest to not allow as a configurable thing and
> just let it be part of the InputSplit calculation.
>
3) Progress information should be calculated and displayed.
>


>  --
> Jeff
>
> On Mon, Jan 25, 2010 at 5:43 AM, Phillip Michalak
>  wrote:
> > Multiple people have expressed an interest in 'hadoop integration' and
> > 'map/reduce functionality' within Cassandra. I'd like to get a feel for
> what
> > that means to different people.
> >
> > As a starting point for discussion, Jeff Hodges undertook a prototype
> effort
> > last summer which was the subject of this thread:
> >
> http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200907.mbox/%3cf5f3a6290907240123y22f065edp1649f7c5c1add...@mail.gmail.com%3e
> .
> >
> > Jeff explicitly mentions data locality as one of the things that was out
> of
> > scope for the prototype. What other features or characteristics would you
> > expect to see in an implementation?
> >
> > Thanks,
> > Phil
> >
>


Re: map/reduce on Cassandra

2010-01-25 Thread Jonathan Ellis
sstablekeys is really the wrong place to support m/r anyway, it just
shows that the index can handle what m/r will need

On Mon, Jan 25, 2010 at 1:28 PM, Ryan Daum  wrote:
> On Mon, Jan 25, 2010 at 2:18 PM, Brandon Williams  wrote:
>
>> bin/sstablekeys will dump just the keys from an sstable without row
>> deserialization overhead, but it can't introspect a commitlog.
>> -Brandon
>
> Yes, and will it not also return the keys that are replicas from
> ranges 'belonging' to other nodes? I.e. running it on all boxes across
> a cluster of  with an RF > 1 would return duplicates where the data
> was replicated. Needs a flag to indicate uniqueness.
>
> Ryan
>


Re: map/reduce on Cassandra

2010-01-25 Thread Ryan Daum
On Mon, Jan 25, 2010 at 2:18 PM, Brandon Williams  wrote:

> bin/sstablekeys will dump just the keys from an sstable without row
> deserialization overhead, but it can't introspect a commitlog.
> -Brandon

Yes, and will it not also return the keys that are replicas from
ranges 'belonging' to other nodes? I.e. running it on all boxes across
a cluster of  with an RF > 1 would return duplicates where the data
was replicated. Needs a flag to indicate uniqueness.

Ryan


Re: map/reduce on Cassandra

2010-01-25 Thread Brandon Williams
On Mon, Jan 25, 2010 at 1:13 PM, Ryan Daum  wrote:

> I agree with what Jeff says here about RandomPartitioner support being key.
>
>
+1


> For my purposes with map/reduce I'd personally be fine with some
> general all-keys dump utility that wrote contents of one node to a
> file, and then just write my own integration from that file into
> Hadoop, etc..
>
> I guess I'm thinking something similar to sstable2json except that
> unfortunately sstable2json will dump replica data not just the local
> node's data. Getting the contents of the commitlog into the file would
> be nice, too.


bin/sstablekeys will dump just the keys from an sstable without row
deserialization overhead, but it can't introspect a commitlog.

-Brandon


Re: map/reduce on Cassandra

2010-01-25 Thread Ryan Daum
I agree with what Jeff says here about RandomPartitioner support being key.

For my purposes with map/reduce I'd personally be fine with some
general all-keys dump utility that wrote contents of one node to a
file, and then just write my own integration from that file into
Hadoop, etc..

I guess I'm thinking something similar to sstable2json except that
unfortunately sstable2json will dump replica data not just the local
node's data. Getting the contents of the commitlog into the file would
be nice, too.

R

On Mon, Jan 25, 2010 at 1:47 PM, Jeff Hodges  wrote:
> 1) Works with RandomPartitioner. This is huge and the only way almost
> everyone would able to use it.
> 2) Ability to divide up the keys of a single node to more than one
> mapper. The prototype just slurped up everything on the node. This
> would probably be easiest to not allow as a configurable thing and
> just let it be part of the InputSplit calculation.
> 3) Progress information should be calculated and displayed.
>  --
> Jeff
>
> On Mon, Jan 25, 2010 at 5:43 AM, Phillip Michalak
>  wrote:
>> Multiple people have expressed an interest in 'hadoop integration' and
>> 'map/reduce functionality' within Cassandra. I'd like to get a feel for what
>> that means to different people.
>>
>> As a starting point for discussion, Jeff Hodges undertook a prototype effort
>> last summer which was the subject of this thread:
>> http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200907.mbox/%3cf5f3a6290907240123y22f065edp1649f7c5c1add...@mail.gmail.com%3e.
>>
>> Jeff explicitly mentions data locality as one of the things that was out of
>> scope for the prototype. What other features or characteristics would you
>> expect to see in an implementation?
>>
>> Thanks,
>> Phil
>>
>


Re: map/reduce on Cassandra

2010-01-25 Thread Jeff Hodges
1) Works with RandomPartitioner. This is huge and the only way almost
everyone would able to use it.
2) Ability to divide up the keys of a single node to more than one
mapper. The prototype just slurped up everything on the node. This
would probably be easiest to not allow as a configurable thing and
just let it be part of the InputSplit calculation.
3) Progress information should be calculated and displayed.
 --
Jeff

On Mon, Jan 25, 2010 at 5:43 AM, Phillip Michalak
 wrote:
> Multiple people have expressed an interest in 'hadoop integration' and
> 'map/reduce functionality' within Cassandra. I'd like to get a feel for what
> that means to different people.
>
> As a starting point for discussion, Jeff Hodges undertook a prototype effort
> last summer which was the subject of this thread:
> http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200907.mbox/%3cf5f3a6290907240123y22f065edp1649f7c5c1add...@mail.gmail.com%3e.
>
> Jeff explicitly mentions data locality as one of the things that was out of
> scope for the prototype. What other features or characteristics would you
> expect to see in an implementation?
>
> Thanks,
> Phil
>


map/reduce on Cassandra

2010-01-25 Thread Phillip Michalak
Multiple people have expressed an interest in 'hadoop integration' and  
'map/reduce functionality' within Cassandra. I'd like to get a feel  
for what that means to different people.


As a starting point for discussion, Jeff Hodges undertook a prototype  
effort last summer which was the subject of this thread: http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200907.mbox/%3cf5f3a6290907240123y22f065edp1649f7c5c1add...@mail.gmail.com%3e 
.


Jeff explicitly mentions data locality as one of the things that was  
out of scope for the prototype. What other features or characteristics  
would you expect to see in an implementation?


Thanks,
Phil


RE: Map Reduce on Cassandra Store

2009-12-04 Thread Mark Vigeant
AH, there we go. Thanks a lot Ryan!

-Mark

-Original Message-
From: Ryan King [mailto:r...@twitter.com]
Sent: Friday, December 04, 2009 12:17 PM
To: cassandra-user@incubator.apache.org
Subject: Re: Map Reduce on Cassandra Store

On Fri, Dec 4, 2009 at 8:44 AM, Mark Vigeant
 wrote:
> Hello!
>
>
>
> Has anyone tried to run MapReduce analytics on data stored in Cassandra? I
> feel like I saw a patch once to get hadoop working on top of Cassandra, but
> I can't find it now. I know that Hadoop integration is big on people's
> wishlists for future versions of Cassandra, but I'm just curious as to
> what's available now.

http://issues.apache.org/jira/browse/CASSANDRA-342

There's no easy way to do it now, but I know we will certainly need it
at some point (as will others), so I'm sure it will eventually happen.

-ryan

>
>
>
> Can anybody out there lend me a hand, or should I stick to HBase? Thanks a
> lot!
>
>
>
>
>
> Mark Vigeant
>
> RiskMetrics Group, Inc.
>
>
>
> This email message and any attachments are for the sole use of the intended
> recipients and may contain proprietary and/or confidential information which
> may be privileged or otherwise protected from disclosure. Any unauthorized
> review, use, disclosure or distribution is prohibited. If you are not an
> intended recipient, please contact the sender by reply email and destroy the
> original message and any copies of the message as well as any attachments to
> the original message.
>

This email message and any attachments are for the sole use of the intended 
recipients and may contain proprietary and/or confidential information which 
may be privileged or otherwise protected from disclosure. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not an 
intended recipient, please contact the sender by reply email and destroy the 
original message and any copies of the message as well as any attachments to 
the original message.


Re: Map Reduce on Cassandra Store

2009-12-04 Thread Ryan King
On Fri, Dec 4, 2009 at 8:44 AM, Mark Vigeant
 wrote:
> Hello!
>
>
>
> Has anyone tried to run MapReduce analytics on data stored in Cassandra? I
> feel like I saw a patch once to get hadoop working on top of Cassandra, but
> I can’t find it now. I know that Hadoop integration is big on people’s
> wishlists for future versions of Cassandra, but I’m just curious as to
> what’s available now.

http://issues.apache.org/jira/browse/CASSANDRA-342

There's no easy way to do it now, but I know we will certainly need it
at some point (as will others), so I'm sure it will eventually happen.

-ryan

>
>
>
> Can anybody out there lend me a hand, or should I stick to HBase? Thanks a
> lot!
>
>
>
>
>
> Mark Vigeant
>
> RiskMetrics Group, Inc.
>
>
>
> This email message and any attachments are for the sole use of the intended
> recipients and may contain proprietary and/or confidential information which
> may be privileged or otherwise protected from disclosure. Any unauthorized
> review, use, disclosure or distribution is prohibited. If you are not an
> intended recipient, please contact the sender by reply email and destroy the
> original message and any copies of the message as well as any attachments to
> the original message.
>


Map Reduce on Cassandra Store

2009-12-04 Thread Mark Vigeant
Hello!

Has anyone tried to run MapReduce analytics on data stored in Cassandra? I feel 
like I saw a patch once to get hadoop working on top of Cassandra, but I can't 
find it now. I know that Hadoop integration is big on people's wishlists for 
future versions of Cassandra, but I'm just curious as to what's available now.

Can anybody out there lend me a hand, or should I stick to HBase? Thanks a lot!


Mark Vigeant
RiskMetrics Group, Inc.


This email message and any attachments are for the sole use of the intended 
recipients and may contain proprietary and/or confidential information which 
may be privileged or otherwise protected from disclosure. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not an 
intended recipient, please contact the sender by reply email and destroy the 
original message and any copies of the message as well as any attachments to 
the original message.