RE: Cassandra 2.0.x OOM during startsup - schema version inconsistency after reboot

2016-05-10 Thread Michael Fong
Hi,

Thanks for your recommendation. 
I also opened a ticket to keep track @ 
https://issues.apache.org/jira/browse/CASSANDRA-11748
Hope this could brought someone's attention to take a look. Thanks.

Sincerely,

Michael Fong

-Original Message-
From: Michael Kjellman [mailto:mkjell...@internalcircle.com] 
Sent: Monday, May 09, 2016 11:57 AM
To: d...@cassandra.apache.org
Cc: user@cassandra.apache.org
Subject: Re: Cassandra 2.0.x OOM during startsup - schema version inconsistency 
after reboot

I'd recommend you create a JIRA! That way you can get some traction on the 
issue. Obviously an OOM is never correct, even if your process is wrong in some 
way!

Best,
kjellman 

Sent from my iPhone

> On May 8, 2016, at 8:48 PM, Michael Fong  
> wrote:
> 
> Hi, all,
> 
> 
> Haven't heard any responses so far, and this isue has troubled us for quite 
> some time. Here is another update:
> 
> We have noticed several times that The schema version may change after 
> migration and reboot:
> 
> Here is the scenario:
> 
> 1.   Two node cluster (1 & 2).
> 
> 2.   There are some schema changes, i.e. create a few new columnfamily. 
> The cluster will wait until both nodes have schema version in sync (describe 
> cluster) before moving on.
> 
> 3.   Right before node2 is rebooted, the schema version is consistent; 
> however, after ndoe2 reboots and starts servicing, the MigrationManager would 
> gossip different schema version.
> 
> 4.   Afterwards, both nodes starts exchanging schema  message 
> indefinitely until one of the node dies.
> 
> We currently suspect the change of schema is due to replying the old entry in 
> commit log. We wish to continue dig further, but need experts help on this.
> 
> I don't know if anyone has seen this before, or if there is anything wrong 
> with our migration flow though..
> 
> Thanks in advance.
> 
> Best regards,
> 
> 
> Michael Fong
> 
> From: Michael Fong [mailto:michael.f...@ruckuswireless.com]
> Sent: Thursday, April 21, 2016 6:41 PM
> To: user@cassandra.apache.org; d...@cassandra.apache.org
> Subject: RE: Cassandra 2.0.x OOM during bootstrap
> 
> Hi, all,
> 
> Here is some more information on before the OOM happened on the rebooted node 
> in a 2-node test cluster:
> 
> 
> 1.   It seems the schema version has changed on the rebooted node after 
> reboot, i.e.
> Before reboot,
> Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 
> MigrationManager.java (line 328) Gossiping my schema version 
> 4cb463f8-5376-3baf-8e88-a5cc6a94f58f
> Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 
> MigrationManager.java (line 328) Gossiping my schema version 
> 4cb463f8-5376-3baf-8e88-a5cc6a94f58f
> 
> After rebooting node 2,
> Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java 
> (line 328) Gossiping my schema version 
> f5270873-ba1f-39c7-ab2e-a86db868b09b
> 
> 
> 
> 2.   After reboot, both nods repeatedly send MigrationTask to each other 
> - we suspect it is related to the schema version (Digest) mismatch after Node 
> 2 rebooted:
> The node2  keeps submitting the migration task over 100+ times to the other 
> node.
> INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) 
> Node /192.168.88.33 has restarted, now UP INFO [GossipStage:1] 
> 2016-04-19 11:18:18,262 TokenMetadata.java (line 414) Updating 
> topology for /192.168.88.33 INFO [GossipStage:1] 2016-04-19 
> 11:18:18,263 StorageService.java (line 1544) Node /192.168.88.33 state 
> jump to normal INFO [GossipStage:1] 2016-04-19 11:18:18,264 
> TokenMetadata.java (line 414) Updating topology for /192.168.88.33 DEBUG 
> [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line 102) 
> Submitting migration task for /192.168.88.33 DEBUG [GossipStage:1] 2016-04-19 
> 11:18:18,265 MigrationManager.java (line 102) Submitting migration task for 
> /192.168.88.33 DEBUG [MigrationStage:1] 2016-04-19 11:18:18,268 
> MigrationTask.java (line 62) Can't send schema pull request: node 
> /192.168.88.33 is down.
> DEBUG [MigrationStage:1] 2016-04-19 11:18:18,268 MigrationTask.java (line 62) 
> Can't send schema pull request: node /192.168.88.33 is down.
> DEBUG [RequestResponseStage:1] 2016-04-19 11:18:18,353 Gossiper.java 
> (line 977) removing expire time for endpoint : /192.168.88.33 INFO 
> [RequestResponseStage:1] 2016-04-19 11:18:18,353 Gossiper.java (line 
> 978) InetAddress /192.168.88.33 is now UP DEBUG 
> [RequestResponseStage:1] 2016-04-19 11:18:18,353 MigrationManager.java 
> (line 102) Submitting migration task for /192.168.88.33 DEBUG 
> [RequestResponseStage:1] 2016-04-19 11:18:18,355 Gossiper.java (line 
> 977) removing expire time for endpoint : /192.168.88.33 INFO 
> [RequestResponseStage:1] 2016-04-19 11:18:18,355 Gossiper.java (line 
> 978) InetAddress /192.168.88.33 is now UP DEBUG [RequestResponseStage:1] 
> 2016-04-19 11:18:18,355 MigrationManager.java (line 102) Submitting migration 
> task for 

RE: Accessing Cassandra data from Spark Shell

2016-05-10 Thread Mohammed Guller
Yes, it is very simple to access Cassandra data using Spark shell.

Step 1: Launch the spark-shell with the spark-cassandra-connector package
$SPARK_HOME/bin/spark-shell --packages 
com.datastax.spark:spark-cassandra-connector_2.10:1.5.0

Step 2: Create a DataFrame pointing to your Cassandra table
val dfCassTable = sqlContext.read
 
.format("org.apache.spark.sql.cassandra")
 .options(Map( "table" 
-> "your_column_family", "keyspace" -> "your_keyspace"))
 .load()

From this point onward, you have complete access to the DataFrame API. You can 
even register it as a temporary table, if you would prefer to use SQL/HiveQL.

Mohammed
Author: Big Data Analytics with 
Spark

From: Ben Slater [mailto:ben.sla...@instaclustr.com]
Sent: Monday, May 9, 2016 9:28 PM
To: user@cassandra.apache.org; user
Subject: Re: Accessing Cassandra data from Spark Shell

You can use SparkShell to access Cassandra via the Spark Cassandra connector. 
The getting started article on our support page will probably give you a good 
steer to get started even if you’re not using Instaclustr: 
https://support.instaclustr.com/hc/en-us/articles/213097877-Getting-Started-with-Instaclustr-Spark-Cassandra-

Cheers
Ben

On Tue, 10 May 2016 at 14:08 Cassa L 
> wrote:
Hi,
Has anyone tried accessing Cassandra data using SparkShell? How do you do it? 
Can you use HiveContext for Cassandra data? I'm using community version of 
Cassandra-3.0

Thanks,
LCassa
--

Ben Slater
Chief Product Officer, Instaclustr
+61 437 929 798


Re: COPY TO export fails with

2016-05-10 Thread Stefania Alborghetti
For COPY TO you can try increasing the page timeout or decreasing the page
size:

PAGETIMEOUT=10   - the page timeout in seconds for fetching results
PAGESIZE='1000'  - the page size for fetching results

You can pass these options to the COPY command by adding "WITH
PAGETIMEOUT=1000;", for example.

It will be slower than Spark but to improve performance you can install the
Python driver with Cython extensions as explained in the Setup section of this
blog
.
The blog also explains how to compile the copy module itself with Cython.
This is not as important as compiling the driver, and on some versions you
may hit CASSANDRA-11574
.



On Tue, May 10, 2016 at 6:39 PM, Matthias Niehoff <
matthias.nieh...@codecentric.de> wrote:

> Hi,
>
> already that copy to might not be the best way to do this. I’ll write a
> small spark job.
>
> Thanks
>
> 2016-05-10 10:36 GMT+02:00 Carlos Rolo :
>
>> Hello,
>>
>> That is a lot of data to do an "COPY TO.
>>
>> If you want a fast way to export, and you're fine with Java, you can use
>> Cassandra SSTableReader classes to read the sstables directly. Spark also
>> works.
>>
>> Regards,
>>
>> Carlos Juzarte Rolo
>> Cassandra Consultant / Datastax Certified Architect / Cassandra MVP
>>
>> Pythian - Love your data
>>
>> rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
>> *linkedin.com/in/carlosjuzarterolo
>> *
>> Mobile: +351 918 918 100
>> www.pythian.com
>>
>> On Tue, May 10, 2016 at 9:33 AM, Matthias Niehoff <
>> matthias.nieh...@codecentric.de> wrote:
>>
>>> sry, sent early..
>>>
>>> more errors:
>>>
>>> /export.cql:9:Error for (4549395184516451179, 4560441269902768904): 
>>> NoHostAvailable - ('Unable to complete the operation against any hosts', 
>>> {: ConnectionException('Host has been marked 
>>> down or removed',)}) (will try again later attempt 1 of 5)
>>> /export.cql:9:Error for (-2083690356124961461, -2068514534992400755): 
>>> NoHostAvailable - ('Unable to complete the operation against any hosts', 
>>> {}) (will try again later attempt 1 of 5)
>>> /export.cql:9:Error for (-4899866517058128956, -4897773268483324406): 
>>> NoHostAvailable - ('Unable to complete the operation against any hosts', 
>>> {}) (will try again later attempt 1 of 5)
>>> /export.cql:9:Error for (-1435092096023471089, -1434747957681478442): 
>>> NoHostAvailable - ('Unable to complete the operation against any hosts', 
>>> {}) (will try again later attempt 1 of 5)
>>> /export.cql:9:Error for (-2804962318029794069, -2783747272192843127): 
>>> NoHostAvailable - ('Unable to complete the operation against any hosts', 
>>> {}) (will try again later attempt 1 of 5)
>>> /export.cql:9:Error for (-5188633782964403059, -5149722481923709224): 
>>> NoHostAvailable - (‚Unable to complete the operation against any hosts', 
>>> {}) (will try again later attempt 1 of 5)
>>>
>>>
>>>
>>> It looks like the cluster can not handle export and the nodes cannot handle 
>>> the export.
>>>
>>> Is the cqlsh copy able to export this amount of data? or should other 
>>> methods be used (sstableloader, some custom code, spark…)
>>>
>>>
>>> Best regards
>>>
>>>
>>> 2016-05-10 10:29 GMT+02:00 Matthias Niehoff <
>>> matthias.nieh...@codecentric.de>:
>>>
 Hi,

 i try to export data of a table (~15GB) using the cqlsh copy to. It
 fails with „no host available“. If I try it with a smaller table everything
 works fine.

 The statistics of the big table:

 SSTable count: 81
 Space used (live): 14102945336
 Space used (total): 14102945336
 Space used by snapshots (total): 62482577
 Off heap memory used (total): 16399540
 SSTable Compression Ratio: 0.1863544514417909
 Number of keys (estimate): 5034845
 Memtable cell count: 5590
 Memtable data size: 18579542
 Memtable off heap memory used: 0
 Memtable switch count: 72
 Local read count: 0
 Local read latency: NaN ms
 Local write count: 139878
 Local write latency: 0.023 ms
 Pending flushes: 0
 Bloom filter false positives: 0
 Bloom filter false ratio: 0.0
 Bloom filter space used: 6224240
 Bloom filter off heap memory used: 6223592
 Index summary off heap memory used: 1098860
 Compression metadata off heap memory used: 9077088
 Compacted partition minimum bytes: 373
 Compacted partition maximum bytes: 1358102
 Compacted partition mean 

Re: Cassandra 3.0.6 Release?

2016-05-10 Thread Tyler Hobbs
On Mon, May 9, 2016 at 2:48 PM, Drew Kutcharian  wrote:

>
>
> What’s the 3.0.6 release date? Seems like the code has been frozen for a
> few days now. I ask because I want to install Cassandra on Ubuntu 16.04 and
> CASSANDRA-10853 is blocking it.
>

We've been holding it up to sync it with the 3.6 release.  There were a
couple of bugs in the first 3.6-tentative tag that forced us to re-roll and
restart test runs.  The release vote for 3.0.6 and 3.6 should start within
the next couple of days, and takes 72 hours to complete.


-- 
Tyler Hobbs
DataStax 


Re: Lot's of hints, but only on a few nodes

2016-05-10 Thread Nate McCall
The most immediate work-around would be to nodetool disablehints around the
cluster before you load data. This would stop it snowballing from hints at
least.


On Tue, May 10, 2016 at 7:49 AM, Erik Forsberg  wrote:

> I have this situation where a few (like, 3-4 out of 84) nodes misbehave.
> Very long GC pauses, dropping out of cluster etc.
>
> This happens while loading data (via CQL), and analyzing metrics it looks
> like on these few nodes, a lot of hints are being generated close to the
> time when they start to misbehave.
>
> Since this is Cassandra 2.0.13 which have a less than optimal hints
> implementation, largs numbers of hints is a GC troublemaker.
>
> Again looking at metrics, it looks like hints are being generated for a
> large number of nodes, so it doesn't look like the destination nodes are at
> fault. So, I'm confused.
>
> Any Hints (pun intended) on what could cause a few nodes to generate more
> hints than the rest of the cluster?
>
> Regards,
> \EF
>



-- 
-
Nate McCall
Austin, TX
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Nodetool repair question

2016-05-10 Thread Joel Knighton
No - repair does not change token ownership. The up/down state of a node is
not related to token ownership.

On Tue, May 10, 2016 at 3:26 PM, Anubhav Kale 
wrote:

> Hello,
>
>
>
> Suppose I have 3 nodes, and stop Cassandra on one of them. Then I run a
> repair. Will repair move the token ranges from down node to other node ? In
> other words in any situation, does repair operation *ever* change token
> ownership ?
>
>
>
> Thanks !
>



-- 



Joel Knighton
Cassandra Developer | joel.knigh...@datastax.com


 

 




Nodetool repair question

2016-05-10 Thread Anubhav Kale
Hello,

Suppose I have 3 nodes, and stop Cassandra on one of them. Then I run a repair. 
Will repair move the token ranges from down node to other node ? In other words 
in any situation, does repair operation ever change token ownership ?

Thanks !


Re: A question to 'paging' support in DataStax java driver

2016-05-10 Thread Sebastian Estevez
I didn't read the whole thread last time around, please disregard my
comment about the java driver jira.

One other thought (hopefully relevant this time). Once we have
https://issues.apache.org/jira/browse/CASSANDRA-10783, you could write a
write a (*start*, *rows*) style paging UDF which would allow you to read
just page 4 for example. Granted you will still have to *scan* the data
from 0 to start at the server and throw it away, but might get you closer
to what you are looking for.




All the best,


[image: datastax_logo.png] 

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png]  [image:
facebook.png]  [image: twitter.png]
 [image: g+.png]







DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Tue, May 10, 2016 at 9:23 AM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> I think this request belongs in the java driver jira not the Cassandra
> jira.
>
> https://datastax-oss.atlassian.net/projects/JAVA/
>
> all the best,
>
> Sebastián
> On May 10, 2016 1:09 AM, "Lu, Boying"  wrote:
>
>> I filed a JIRA https://issues.apache.org/jira/browse/CASSANDRA-11741 to
>> track this.
>>
>>
>>
>> *From:* DuyHai Doan [mailto:doanduy...@gmail.com]
>> *Sent:* 2016年5月10日 12:47
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: A question to 'paging' support in DataStax java driver
>>
>>
>>
>> I guess it's technically possible but then we'll need to update the
>> binary protocol. Just create a JIRA and ask for this feature
>>
>>
>>
>> On Tue, May 10, 2016 at 5:00 AM, Lu, Boying  wrote:
>>
>> Thanks very much.
>>
>>
>>
>> I understand that the data needs to be read from the DB to get the next
>> ‘PagingState’.
>>
>>
>>
>> But is it possible not to return those data to the client side, just
>> returning the ‘PagingState’?
>>
>> I.e. the data is read on the server side, but not return to client side,
>> this can save some bandwidth
>>
>> between client and server.
>>
>>
>>
>>
>>
>> *From:* DuyHai Doan [mailto:doanduy...@gmail.com]
>> *Sent:* 2016年5月9日 21:06
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: A question to 'paging' support in DataStax java driver
>>
>>
>>
>> In a truly consistent world (should I say "snapshot isolation" world
>> instead), re-reading the same page should yield the same results no matter
>> how many new inserts have occurred since the last page read.
>>
>>
>>
>> Caching previous page at app level can be a solution but not viable if
>> the amount of data is huge, also you'll need a cache layer and deal with
>> cache invalidation etc ...
>>
>>
>>
>> The point is, providing snapshot isolation in a distributed system is
>> hard without some sort of synchronous coordination e.g. global lock (read
>> http://www.bailis.org/papers/hat-vldb2014.pdf)
>>
>>
>>
>>
>>
>> On Mon, May 9, 2016 at 2:17 PM, Bhuvan Rawal  wrote:
>>
>> Hi Doan,
>>
>>
>>
>> What does it have to do being eventual consistency? Lets assume a
>> scenario with complete consistency and we are at page X, and at the same
>> time some inserts/updates happened at page X-2 and we jumped to that.
>>
>> User will see inconsistent page in that case as well, right? Also in such
>> cases how would you design a user facing application (Cache previous pages
>> at app level?)
>>
>>
>>
>> Regards,
>>
>> Bhuvan
>>
>>
>>
>> On Mon, May 9, 2016 at 4:18 PM, DuyHai Doan  wrote:
>>
>> "Is it possible to just return PagingState object without returning
>> data?" --> No
>>
>>
>>
>> Simply because before reading the actual data for each page of N rows,
>> you cannot know at which token value a page of data starts...
>>
>>
>>
>> And it is worst than that, with paging you don't have any isolation.
>> Let's suppose you keep in your application/web front-end the paging states
>> for page 1, 2 and 3. Since there are concurrent inserts on the cluster at
>> the same time, when you re-use the paging state 2 for example, you may not
>> get the same results as the previous read.
>>
>>
>>
>> And it is inevitable in an eventual consistent distributed DB world
>>
>>
>>
>> On Mon, May 9, 2016 at 12:25 PM, Lu, Boying  wrote:
>>
>> dHi, All,
>>
>>
>>
>> We are considering to use DataStax java driver in our codes. One
>> important feature 

Re: Data platform support

2016-05-10 Thread Srini Sydney
I understand that spark supports hdfs and standalone modes.
The recommendation from cassandra is that spark should be installed in
standalone mode in SMACK framework.

On 10 May 2016 at 16:24, Sruti S  wrote:

> Not sure what is meant.. Spark can access HDFS. Why is it in standalone
> mode? Please clarify.
>
> On Tue, May 10, 2016 at 11:08 AM, Srini Sydney 
> wrote:
>
>> I have a clarification based on your answer -
>>
>> spark is installed as standalone mode (not hdfs) in SMACK framework. Our
>> data lake is in hdfs . How do we overcome this ?
>>
>>
>>  - cheers sreeni
>>
>>
>> On 10 May 2016, at 08:16, vincent gromakowski <
>> vincent.gromakow...@gmail.com> wrote:
>>
>> Maybe a SMACK stack would be a better option for using spark with
>> Cassandra...
>> Le 10 mai 2016 8:45 AM, "Srini Sydney"  a écrit :
>>
>>> Thanks a lot..denise
>>>
>>> On 10 May 2016 at 02:42, Denise Rogers  wrote:
>>>
 It really depends how close you want to stay to the most current
 versions of open source community products.

 Cloudera has tended to build more products that requires their
 distribution to not be as current with open source product versions.

 Regards,
 Denise

 Sent from mi iPhone

 > On May 9, 2016, at 8:21 PM, Srini Sydney 
 wrote:
 >
 > Hi guys
 >
 > We are thinking of using one the 3 big data platforms i.e hortonworks
 , mapr or cloudera . Will use hadoop ,hive , zookeeper, and spark in these
 platforms.
 >
 >
 > Which platform would be better suited for cassandra ?
 >
 >
 > -  sreeni
 >


>>>
>


Re: Data platform support

2016-05-10 Thread Sruti S
Not sure what is meant.. Spark can access HDFS. Why is it in standalone
mode? Please clarify.

On Tue, May 10, 2016 at 11:08 AM, Srini Sydney 
wrote:

> I have a clarification based on your answer -
>
> spark is installed as standalone mode (not hdfs) in SMACK framework. Our
> data lake is in hdfs . How do we overcome this ?
>
>
>  - cheers sreeni
>
>
> On 10 May 2016, at 08:16, vincent gromakowski <
> vincent.gromakow...@gmail.com> wrote:
>
> Maybe a SMACK stack would be a better option for using spark with
> Cassandra...
> Le 10 mai 2016 8:45 AM, "Srini Sydney"  a écrit :
>
>> Thanks a lot..denise
>>
>> On 10 May 2016 at 02:42, Denise Rogers  wrote:
>>
>>> It really depends how close you want to stay to the most current
>>> versions of open source community products.
>>>
>>> Cloudera has tended to build more products that requires their
>>> distribution to not be as current with open source product versions.
>>>
>>> Regards,
>>> Denise
>>>
>>> Sent from mi iPhone
>>>
>>> > On May 9, 2016, at 8:21 PM, Srini Sydney 
>>> wrote:
>>> >
>>> > Hi guys
>>> >
>>> > We are thinking of using one the 3 big data platforms i.e hortonworks
>>> , mapr or cloudera . Will use hadoop ,hive , zookeeper, and spark in these
>>> platforms.
>>> >
>>> >
>>> > Which platform would be better suited for cassandra ?
>>> >
>>> >
>>> > -  sreeni
>>> >
>>>
>>>
>>


Re: Data platform support

2016-05-10 Thread Srini Sydney
I have a clarification based on your answer -

spark is installed as standalone mode (not hdfs) in SMACK framework. Our data 
lake is in hdfs . How do we overcome this ?

 
 - cheers sreeni


> On 10 May 2016, at 08:16, vincent gromakowski  
> wrote:
> 
> Maybe a SMACK stack would be a better option for using spark with Cassandra...
> 
> Le 10 mai 2016 8:45 AM, "Srini Sydney"  a écrit :
>> Thanks a lot..denise
>> 
>> On 10 May 2016 at 02:42, Denise Rogers  wrote:
>>> It really depends how close you want to stay to the most current versions 
>>> of open source community products.
>>> 
>>> Cloudera has tended to build more products that requires their distribution 
>>> to not be as current with open source product versions.
>>> 
>>> Regards,
>>> Denise
>>> 
>>> Sent from mi iPhone
>>> 
>>> > On May 9, 2016, at 8:21 PM, Srini Sydney  wrote:
>>> >
>>> > Hi guys
>>> >
>>> > We are thinking of using one the 3 big data platforms i.e hortonworks , 
>>> > mapr or cloudera . Will use hadoop ,hive , zookeeper, and spark in these 
>>> > platforms.
>>> >
>>> >
>>> > Which platform would be better suited for cassandra ?
>>> >
>>> >
>>> > -  sreeni
>>> >


RE: A question to 'paging' support in DataStax java driver

2016-05-10 Thread Sebastian Estevez
I think this request belongs in the java driver jira not the Cassandra jira.

https://datastax-oss.atlassian.net/projects/JAVA/

all the best,

Sebastián
On May 10, 2016 1:09 AM, "Lu, Boying"  wrote:

> I filed a JIRA https://issues.apache.org/jira/browse/CASSANDRA-11741 to
> track this.
>
>
>
> *From:* DuyHai Doan [mailto:doanduy...@gmail.com]
> *Sent:* 2016年5月10日 12:47
> *To:* user@cassandra.apache.org
> *Subject:* Re: A question to 'paging' support in DataStax java driver
>
>
>
> I guess it's technically possible but then we'll need to update the binary
> protocol. Just create a JIRA and ask for this feature
>
>
>
> On Tue, May 10, 2016 at 5:00 AM, Lu, Boying  wrote:
>
> Thanks very much.
>
>
>
> I understand that the data needs to be read from the DB to get the next
> ‘PagingState’.
>
>
>
> But is it possible not to return those data to the client side, just
> returning the ‘PagingState’?
>
> I.e. the data is read on the server side, but not return to client side,
> this can save some bandwidth
>
> between client and server.
>
>
>
>
>
> *From:* DuyHai Doan [mailto:doanduy...@gmail.com]
> *Sent:* 2016年5月9日 21:06
> *To:* user@cassandra.apache.org
> *Subject:* Re: A question to 'paging' support in DataStax java driver
>
>
>
> In a truly consistent world (should I say "snapshot isolation" world
> instead), re-reading the same page should yield the same results no matter
> how many new inserts have occurred since the last page read.
>
>
>
> Caching previous page at app level can be a solution but not viable if the
> amount of data is huge, also you'll need a cache layer and deal with cache
> invalidation etc ...
>
>
>
> The point is, providing snapshot isolation in a distributed system is hard
> without some sort of synchronous coordination e.g. global lock (read
> http://www.bailis.org/papers/hat-vldb2014.pdf)
>
>
>
>
>
> On Mon, May 9, 2016 at 2:17 PM, Bhuvan Rawal  wrote:
>
> Hi Doan,
>
>
>
> What does it have to do being eventual consistency? Lets assume a scenario
> with complete consistency and we are at page X, and at the same time some
> inserts/updates happened at page X-2 and we jumped to that.
>
> User will see inconsistent page in that case as well, right? Also in such
> cases how would you design a user facing application (Cache previous pages
> at app level?)
>
>
>
> Regards,
>
> Bhuvan
>
>
>
> On Mon, May 9, 2016 at 4:18 PM, DuyHai Doan  wrote:
>
> "Is it possible to just return PagingState object without returning
> data?" --> No
>
>
>
> Simply because before reading the actual data for each page of N rows, you
> cannot know at which token value a page of data starts...
>
>
>
> And it is worst than that, with paging you don't have any isolation. Let's
> suppose you keep in your application/web front-end the paging states for
> page 1, 2 and 3. Since there are concurrent inserts on the cluster at the
> same time, when you re-use the paging state 2 for example, you may not get
> the same results as the previous read.
>
>
>
> And it is inevitable in an eventual consistent distributed DB world
>
>
>
> On Mon, May 9, 2016 at 12:25 PM, Lu, Boying  wrote:
>
> dHi, All,
>
>
>
> We are considering to use DataStax java driver in our codes. One important
> feature provided by the driver we want to use is ‘paging’.
>
> But according to the
> https://datastax.github.io/java-driver/3.0.0/manual/paging/, it seems
> that we can’t jump between pages.
>
>
>
> Is it possible to just return PagingState object without returning data?
> e.g.  If I want to jump to the page 5 from the page 1,
>
> I need to go through each page from page 1 to page 5,  Is it possible to
> just return the PagingState object of page 1, 2, 3 and 4 without
>
> actual data of each page? This can save some bandwidth at least.
>
>
>
> Thanks in advance.
>
>
>
> Boying
>
>
>
>
>
>
>
>
>
>
>
>
>


Lot's of hints, but only on a few nodes

2016-05-10 Thread Erik Forsberg
I have this situation where a few (like, 3-4 out of 84) nodes misbehave. 
Very long GC pauses, dropping out of cluster etc.


This happens while loading data (via CQL), and analyzing metrics it 
looks like on these few nodes, a lot of hints are being generated close 
to the time when they start to misbehave.


Since this is Cassandra 2.0.13 which have a less than optimal hints 
implementation, largs numbers of hints is a GC troublemaker.


Again looking at metrics, it looks like hints are being generated for a 
large number of nodes, so it doesn't look like the destination nodes are 
at fault. So, I'm confused.


Any Hints (pun intended) on what could cause a few nodes to generate 
more hints than the rest of the cluster?


Regards,
\EF


Low cardinality secondary index behaviour

2016-05-10 Thread Atul Saroha
I have concern over using secondary index on field with low cardinality.
Lets say I have few billion rows and each row can be classified in 1000
category. Lets say we have 50 node cluster.

Now we want to fetch data for a single category using secondary index over
a category. And query is paginated too with fetch size property say 5000.

Since query on secondary index works as scatter and gatherer approach by
coordinator node. Would it lead to out of memory on coordinator or timeout
errors too much.

How does pagination (token level data fetch) behave in scatter and gatherer
approach?

Secondly, What If we create an inverted table with partition key as
category. Then this will led to lots of data on single node. Then it might
led to hot shard issue and performance issue of data fetching from single
node as a single partition has  millions of rows.

How should we tackle such low cardinality index in Cassandra?

Thanks
-
Atul Saroha
*Lead Software Engineer*

Plot # 362, ASF Centre - Tower A, Udyog Vihar,
 Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA


Re: COPY TO export fails with

2016-05-10 Thread Matthias Niehoff
Hi,

already that copy to might not be the best way to do this. I’ll write a
small spark job.

Thanks

2016-05-10 10:36 GMT+02:00 Carlos Rolo :

> Hello,
>
> That is a lot of data to do an "COPY TO.
>
> If you want a fast way to export, and you're fine with Java, you can use
> Cassandra SSTableReader classes to read the sstables directly. Spark also
> works.
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant / Datastax Certified Architect / Cassandra MVP
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
> *linkedin.com/in/carlosjuzarterolo
> *
> Mobile: +351 918 918 100
> www.pythian.com
>
> On Tue, May 10, 2016 at 9:33 AM, Matthias Niehoff <
> matthias.nieh...@codecentric.de> wrote:
>
>> sry, sent early..
>>
>> more errors:
>>
>> /export.cql:9:Error for (4549395184516451179, 4560441269902768904): 
>> NoHostAvailable - ('Unable to complete the operation against any hosts', 
>> {: ConnectionException('Host has been marked 
>> down or removed',)}) (will try again later attempt 1 of 5)
>> /export.cql:9:Error for (-2083690356124961461, -2068514534992400755): 
>> NoHostAvailable - ('Unable to complete the operation against any hosts', {}) 
>> (will try again later attempt 1 of 5)
>> /export.cql:9:Error for (-4899866517058128956, -4897773268483324406): 
>> NoHostAvailable - ('Unable to complete the operation against any hosts', {}) 
>> (will try again later attempt 1 of 5)
>> /export.cql:9:Error for (-1435092096023471089, -1434747957681478442): 
>> NoHostAvailable - ('Unable to complete the operation against any hosts', {}) 
>> (will try again later attempt 1 of 5)
>> /export.cql:9:Error for (-2804962318029794069, -2783747272192843127): 
>> NoHostAvailable - ('Unable to complete the operation against any hosts', {}) 
>> (will try again later attempt 1 of 5)
>> /export.cql:9:Error for (-5188633782964403059, -5149722481923709224): 
>> NoHostAvailable - (‚Unable to complete the operation against any hosts', {}) 
>> (will try again later attempt 1 of 5)
>>
>>
>>
>> It looks like the cluster can not handle export and the nodes cannot handle 
>> the export.
>>
>> Is the cqlsh copy able to export this amount of data? or should other 
>> methods be used (sstableloader, some custom code, spark…)
>>
>>
>> Best regards
>>
>>
>> 2016-05-10 10:29 GMT+02:00 Matthias Niehoff <
>> matthias.nieh...@codecentric.de>:
>>
>>> Hi,
>>>
>>> i try to export data of a table (~15GB) using the cqlsh copy to. It
>>> fails with „no host available“. If I try it with a smaller table everything
>>> works fine.
>>>
>>> The statistics of the big table:
>>>
>>> SSTable count: 81
>>> Space used (live): 14102945336
>>> Space used (total): 14102945336
>>> Space used by snapshots (total): 62482577
>>> Off heap memory used (total): 16399540
>>> SSTable Compression Ratio: 0.1863544514417909
>>> Number of keys (estimate): 5034845
>>> Memtable cell count: 5590
>>> Memtable data size: 18579542
>>> Memtable off heap memory used: 0
>>> Memtable switch count: 72
>>> Local read count: 0
>>> Local read latency: NaN ms
>>> Local write count: 139878
>>> Local write latency: 0.023 ms
>>> Pending flushes: 0
>>> Bloom filter false positives: 0
>>> Bloom filter false ratio: 0.0
>>> Bloom filter space used: 6224240
>>> Bloom filter off heap memory used: 6223592
>>> Index summary off heap memory used: 1098860
>>> Compression metadata off heap memory used: 9077088
>>> Compacted partition minimum bytes: 373
>>> Compacted partition maximum bytes: 1358102
>>> Compacted partition mean bytes: 16252
>>> Average live cells per slice (last five minutes): 0.0
>>> Maximum live cells per slice (last five minutes): 0.0
>>> Average tombstones per slice (last five minutes): 0.0
>>> Maximum tombstones per slice (last five minutes): 0.0
>>>
>>>
>>> Some of the errors:
>>>
>>> /export.cql:9:Error for (269754647900342974, 272655475232221549): 
>>> OperationTimedOut - errors={}, last_host=10.1.12.89 (will try again later 
>>> attempt 1 of 5)
>>> /export.cql:9:Error for (-3191598516608295217, -3188807168672208162): 
>>> OperationTimedOut - errors={}, last_host=10.1.12.89 (will try again later 
>>> attempt 1 of 5)
>>> /export.cql:9:Error for (-3066009427947359685, -3058745599093267591): 
>>> OperationTimedOut - errors={}, last_host=10.1.8.5 (will try again later 
>>> attempt 1 of 5)
>>> /export.cql:9:Error for (-1737068099173540127, -1716693115263588178): 
>>> OperationTimedOut - errors={}, last_host=10.1.8.5 (will try 

Re: [C*3.0.3]lucene indexes not deleted and nodetool repair makes DC unavailable

2016-05-10 Thread Eduardo Alonso
Hi all,
Sorry, I tested with an old index jar. The cassandra-3.0.3 and
dsc-cassandra-3.0.3 packages are the same. The error happens in both, i
think we have fixed it and it will be included in next release (maybe
3.0.5.1).

1.- Full repair is very intensive, thats why your cluster is non responsive
during repair. You can try several ways to avoid this:

 A.- Run nodetool repair -pr in every node one at a time. You'll divide
the repair in three slots.
 B.- Maybe your use case fits in incremental repairs

Cheers













Eduardo Alonso
Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
*

2016-05-09 8:34 GMT+02:00 Siddharth Verma :

> Hi Eduardo,
> Thanks for your help on stratio index problem
>
> As per your questions.
>
> 1. We ran nodetool repair on one box(no range repair), but due to it,
> entire DC was non responsive.
> It was up, but we were not able to connect.
>
> 2. RF is 3, and we have 2 DCs each with 3 nodes.
>
> 3. Consistency level for writes is Local_Quorum.
>
> Thanks
> Siddharth Verma
>


Re: COPY TO export fails with

2016-05-10 Thread Carlos Rolo
Hello,

That is a lot of data to do an "COPY TO.

If you want a fast way to export, and you're fine with Java, you can use
Cassandra SSTableReader classes to read the sstables directly. Spark also
works.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant / Datastax Certified Architect / Cassandra MVP

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
*linkedin.com/in/carlosjuzarterolo
*
Mobile: +351 918 918 100
www.pythian.com

On Tue, May 10, 2016 at 9:33 AM, Matthias Niehoff <
matthias.nieh...@codecentric.de> wrote:

> sry, sent early..
>
> more errors:
>
> /export.cql:9:Error for (4549395184516451179, 4560441269902768904): 
> NoHostAvailable - ('Unable to complete the operation against any hosts', 
> {: ConnectionException('Host has been marked down 
> or removed',)}) (will try again later attempt 1 of 5)
> /export.cql:9:Error for (-2083690356124961461, -2068514534992400755): 
> NoHostAvailable - ('Unable to complete the operation against any hosts', {}) 
> (will try again later attempt 1 of 5)
> /export.cql:9:Error for (-4899866517058128956, -4897773268483324406): 
> NoHostAvailable - ('Unable to complete the operation against any hosts', {}) 
> (will try again later attempt 1 of 5)
> /export.cql:9:Error for (-1435092096023471089, -1434747957681478442): 
> NoHostAvailable - ('Unable to complete the operation against any hosts', {}) 
> (will try again later attempt 1 of 5)
> /export.cql:9:Error for (-2804962318029794069, -2783747272192843127): 
> NoHostAvailable - ('Unable to complete the operation against any hosts', {}) 
> (will try again later attempt 1 of 5)
> /export.cql:9:Error for (-5188633782964403059, -5149722481923709224): 
> NoHostAvailable - (‚Unable to complete the operation against any hosts', {}) 
> (will try again later attempt 1 of 5)
>
>
>
> It looks like the cluster can not handle export and the nodes cannot handle 
> the export.
>
> Is the cqlsh copy able to export this amount of data? or should other methods 
> be used (sstableloader, some custom code, spark…)
>
>
> Best regards
>
>
> 2016-05-10 10:29 GMT+02:00 Matthias Niehoff <
> matthias.nieh...@codecentric.de>:
>
>> Hi,
>>
>> i try to export data of a table (~15GB) using the cqlsh copy to. It fails
>> with „no host available“. If I try it with a smaller table everything works
>> fine.
>>
>> The statistics of the big table:
>>
>> SSTable count: 81
>> Space used (live): 14102945336
>> Space used (total): 14102945336
>> Space used by snapshots (total): 62482577
>> Off heap memory used (total): 16399540
>> SSTable Compression Ratio: 0.1863544514417909
>> Number of keys (estimate): 5034845
>> Memtable cell count: 5590
>> Memtable data size: 18579542
>> Memtable off heap memory used: 0
>> Memtable switch count: 72
>> Local read count: 0
>> Local read latency: NaN ms
>> Local write count: 139878
>> Local write latency: 0.023 ms
>> Pending flushes: 0
>> Bloom filter false positives: 0
>> Bloom filter false ratio: 0.0
>> Bloom filter space used: 6224240
>> Bloom filter off heap memory used: 6223592
>> Index summary off heap memory used: 1098860
>> Compression metadata off heap memory used: 9077088
>> Compacted partition minimum bytes: 373
>> Compacted partition maximum bytes: 1358102
>> Compacted partition mean bytes: 16252
>> Average live cells per slice (last five minutes): 0.0
>> Maximum live cells per slice (last five minutes): 0.0
>> Average tombstones per slice (last five minutes): 0.0
>> Maximum tombstones per slice (last five minutes): 0.0
>>
>>
>> Some of the errors:
>>
>> /export.cql:9:Error for (269754647900342974, 272655475232221549): 
>> OperationTimedOut - errors={}, last_host=10.1.12.89 (will try again later 
>> attempt 1 of 5)
>> /export.cql:9:Error for (-3191598516608295217, -3188807168672208162): 
>> OperationTimedOut - errors={}, last_host=10.1.12.89 (will try again later 
>> attempt 1 of 5)
>> /export.cql:9:Error for (-3066009427947359685, -3058745599093267591): 
>> OperationTimedOut - errors={}, last_host=10.1.8.5 (will try again later 
>> attempt 1 of 5)
>> /export.cql:9:Error for (-1737068099173540127, -1716693115263588178): 
>> OperationTimedOut - errors={}, last_host=10.1.8.5 (will try again later 
>> attempt 1 of 5)
>> /export.cql:9:Error for (-655042025062419794, -627527938552757160): 
>> OperationTimedOut - errors={}, last_host=10.1.12.89 (will try again later 
>> attempt 1 of 5)
>> /export.cql:9:Error for (2441403877625910843, 2445504271098651532): 
>> OperationTimedOut - 

Re: COPY TO export fails with

2016-05-10 Thread Matthias Niehoff
sry, sent early..

more errors:

/export.cql:9:Error for (4549395184516451179, 4560441269902768904):
NoHostAvailable - ('Unable to complete the operation against any
hosts', {: ConnectionException('Host has
been marked down or removed',)}) (will try again later attempt 1 of 5)
/export.cql:9:Error for (-2083690356124961461, -2068514534992400755):
NoHostAvailable - ('Unable to complete the operation against any
hosts', {}) (will try again later attempt 1 of 5)
/export.cql:9:Error for (-4899866517058128956, -4897773268483324406):
NoHostAvailable - ('Unable to complete the operation against any
hosts', {}) (will try again later attempt 1 of 5)
/export.cql:9:Error for (-1435092096023471089, -1434747957681478442):
NoHostAvailable - ('Unable to complete the operation against any
hosts', {}) (will try again later attempt 1 of 5)
/export.cql:9:Error for (-2804962318029794069, -2783747272192843127):
NoHostAvailable - ('Unable to complete the operation against any
hosts', {}) (will try again later attempt 1 of 5)
/export.cql:9:Error for (-5188633782964403059, -5149722481923709224):
NoHostAvailable - (‚Unable to complete the operation against any
hosts', {}) (will try again later attempt 1 of 5)



It looks like the cluster can not handle export and the nodes cannot
handle the export.

Is the cqlsh copy able to export this amount of data? or should other
methods be used (sstableloader, some custom code, spark…)


Best regards


2016-05-10 10:29 GMT+02:00 Matthias Niehoff :

> Hi,
>
> i try to export data of a table (~15GB) using the cqlsh copy to. It fails
> with „no host available“. If I try it with a smaller table everything works
> fine.
>
> The statistics of the big table:
>
> SSTable count: 81
> Space used (live): 14102945336
> Space used (total): 14102945336
> Space used by snapshots (total): 62482577
> Off heap memory used (total): 16399540
> SSTable Compression Ratio: 0.1863544514417909
> Number of keys (estimate): 5034845
> Memtable cell count: 5590
> Memtable data size: 18579542
> Memtable off heap memory used: 0
> Memtable switch count: 72
> Local read count: 0
> Local read latency: NaN ms
> Local write count: 139878
> Local write latency: 0.023 ms
> Pending flushes: 0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.0
> Bloom filter space used: 6224240
> Bloom filter off heap memory used: 6223592
> Index summary off heap memory used: 1098860
> Compression metadata off heap memory used: 9077088
> Compacted partition minimum bytes: 373
> Compacted partition maximum bytes: 1358102
> Compacted partition mean bytes: 16252
> Average live cells per slice (last five minutes): 0.0
> Maximum live cells per slice (last five minutes): 0.0
> Average tombstones per slice (last five minutes): 0.0
> Maximum tombstones per slice (last five minutes): 0.0
>
>
> Some of the errors:
>
> /export.cql:9:Error for (269754647900342974, 272655475232221549): 
> OperationTimedOut - errors={}, last_host=10.1.12.89 (will try again later 
> attempt 1 of 5)
> /export.cql:9:Error for (-3191598516608295217, -3188807168672208162): 
> OperationTimedOut - errors={}, last_host=10.1.12.89 (will try again later 
> attempt 1 of 5)
> /export.cql:9:Error for (-3066009427947359685, -3058745599093267591): 
> OperationTimedOut - errors={}, last_host=10.1.8.5 (will try again later 
> attempt 1 of 5)
> /export.cql:9:Error for (-1737068099173540127, -1716693115263588178): 
> OperationTimedOut - errors={}, last_host=10.1.8.5 (will try again later 
> attempt 1 of 5)
> /export.cql:9:Error for (-655042025062419794, -627527938552757160): 
> OperationTimedOut - errors={}, last_host=10.1.12.89 (will try again later 
> attempt 1 of 5)
> /export.cql:9:Error for (2441403877625910843, 2445504271098651532): 
> OperationTimedOut - errors={}, last_host=10.1.12.89 (permanently given up 
> after 1000 rows and 1 attempts)
>
>
> …
>
>
>
> --
> Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
> 172.1702676
> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
> www.more4fi.de
>
> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen Schütz
>
> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche
> und/oder rechtlich geschützte 

COPY TO export fails with

2016-05-10 Thread Matthias Niehoff
Hi,

i try to export data of a table (~15GB) using the cqlsh copy to. It fails
with „no host available“. If I try it with a smaller table everything works
fine.

The statistics of the big table:

SSTable count: 81
Space used (live): 14102945336
Space used (total): 14102945336
Space used by snapshots (total): 62482577
Off heap memory used (total): 16399540
SSTable Compression Ratio: 0.1863544514417909
Number of keys (estimate): 5034845
Memtable cell count: 5590
Memtable data size: 18579542
Memtable off heap memory used: 0
Memtable switch count: 72
Local read count: 0
Local read latency: NaN ms
Local write count: 139878
Local write latency: 0.023 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.0
Bloom filter space used: 6224240
Bloom filter off heap memory used: 6223592
Index summary off heap memory used: 1098860
Compression metadata off heap memory used: 9077088
Compacted partition minimum bytes: 373
Compacted partition maximum bytes: 1358102
Compacted partition mean bytes: 16252
Average live cells per slice (last five minutes): 0.0
Maximum live cells per slice (last five minutes): 0.0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0.0


Some of the errors:

/export.cql:9:Error for (269754647900342974, 272655475232221549):
OperationTimedOut - errors={}, last_host=10.1.12.89 (will try again
later attempt 1 of 5)
/export.cql:9:Error for (-3191598516608295217, -3188807168672208162):
OperationTimedOut - errors={}, last_host=10.1.12.89 (will try again
later attempt 1 of 5)
/export.cql:9:Error for (-3066009427947359685, -3058745599093267591):
OperationTimedOut - errors={}, last_host=10.1.8.5 (will try again
later attempt 1 of 5)
/export.cql:9:Error for (-1737068099173540127, -1716693115263588178):
OperationTimedOut - errors={}, last_host=10.1.8.5 (will try again
later attempt 1 of 5)
/export.cql:9:Error for (-655042025062419794, -627527938552757160):
OperationTimedOut - errors={}, last_host=10.1.12.89 (will try again
later attempt 1 of 5)
/export.cql:9:Error for (2441403877625910843, 2445504271098651532):
OperationTimedOut - errors={}, last_host=10.1.12.89 (permanently given
up after 1000 rows and 1 attempts)


…



-- 
Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
172.1702676
www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
www.more4fi.de

Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen Schütz

Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche
und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige
Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie
bitte sofort den Absender und löschen Sie diese E-Mail und evtl.
beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen
evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser E-Mail ist
nicht gestattet


Re: Data platform support

2016-05-10 Thread vincent gromakowski
Maybe a SMACK stack would be a better option for using spark with
Cassandra...
Le 10 mai 2016 8:45 AM, "Srini Sydney"  a écrit :

> Thanks a lot..denise
>
> On 10 May 2016 at 02:42, Denise Rogers  wrote:
>
>> It really depends how close you want to stay to the most current versions
>> of open source community products.
>>
>> Cloudera has tended to build more products that requires their
>> distribution to not be as current with open source product versions.
>>
>> Regards,
>> Denise
>>
>> Sent from mi iPhone
>>
>> > On May 9, 2016, at 8:21 PM, Srini Sydney 
>> wrote:
>> >
>> > Hi guys
>> >
>> > We are thinking of using one the 3 big data platforms i.e hortonworks ,
>> mapr or cloudera . Will use hadoop ,hive , zookeeper, and spark in these
>> platforms.
>> >
>> >
>> > Which platform would be better suited for cassandra ?
>> >
>> >
>> > -  sreeni
>> >
>>
>>
>


Re: Data platform support

2016-05-10 Thread Srini Sydney
Thanks a lot..denise

On 10 May 2016 at 02:42, Denise Rogers  wrote:

> It really depends how close you want to stay to the most current versions
> of open source community products.
>
> Cloudera has tended to build more products that requires their
> distribution to not be as current with open source product versions.
>
> Regards,
> Denise
>
> Sent from mi iPhone
>
> > On May 9, 2016, at 8:21 PM, Srini Sydney  wrote:
> >
> > Hi guys
> >
> > We are thinking of using one the 3 big data platforms i.e hortonworks ,
> mapr or cloudera . Will use hadoop ,hive , zookeeper, and spark in these
> platforms.
> >
> >
> > Which platform would be better suited for cassandra ?
> >
> >
> > -  sreeni
> >
>
>