REMINDER: Apache EU Roadshow 2018 in Berlin is less than 2 weeks away!

2018-05-31 Thread sharan

Hello Apache Supporters and Enthusiasts

This is a reminder that our Apache EU Roadshow in Berlin is less than 
two weeks away and we need your help to spread the word. Please let your 
work colleagues, friends and anyone interested in any attending know 
about our Apache EU Roadshow event.


We have a great schedule including tracks on Apache Tomcat, Apache Http 
Server, Microservices, Internet of Things (IoT) and Cloud Technologies. 
You can find more details at the link below:


https://s.apache.org/0hnG

Ticket prices will be going up on 8^th June 2018, so please make sure 
that you register soon if you want to beat the price increase. 
https://foss-backstage.de/tickets


Remember that registering for the Apache EU Roadshow also gives you 
access to FOSS Backstage so you can attend any talks and workshops from 
both conferences. And don’t forget that our Apache Lounge will be open 
throughout the whole conference as a place to meet up, hack and relax.


We look forward to seeing you in Berlin!

Thanks
Sharan Foga,  VP Apache Community Development

http://apachecon.com/
@apachecon

PLEASE NOTE: You are receiving this message because you are subscribed 
to a user@ or dev@ list of one or more Apache Software Foundation projects.


Re: Mongo DB vs Cassandra

2018-05-31 Thread Joseph Arriola
Based on the metrics you say, I think the big data architecture can be:
cassandra with spark. you mention high availability. the apis could use
node.js. This combination is powerful, the challenge is in the data model.

On the other hand, if you are willing to sacrifice high availability and
slow response time, mongodb can be easier to implement.



El El jue, 31 de may. de 2018 a las 10:01 a. m., Sudhakar Ganesan
 escribió:

> At high level, in the production line, machine will provide the data in
> the form of CSV in every 1 sec to 1 minutes to 1 day ( depending on machine
> type used in the line operations). I need to parse those files and load it
> to DB and build and API layer expose it to downstream systems.
>
>
>
> *Number of files to be processed   13,889,660,134  per day*
>
> *Each file could range from 20 KB to 600MB which will translate into few
> hundred rows to millions of rows.*
>
> *High availability with high write. Read is less compare to write.*
>
> *While extracting the rows, few validation to be performed.*
>
> *Build an API layer on top of the data to be persisted in the DB.*
>
>
>
> Now, tell me what would be the best choice…
>
>
>
> *From:* Russell Bateman [mailto:r...@windofkeltia.com]
> *Sent:* Thursday, May 31, 2018 7:36 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Mongo DB vs Cassandra
>
>
>
> Sudhakar,
>
> MongoDB will accommodate loading CSV without regard to schema while still
> creating identifiable "columns" in the database, but you'll have to predict
> or back-impose some schema later if you're going to create indices for fast
> searching of the data. You can perform searching of data without indexing
> in MongoDB, but it's slower.
>
> Cassandra will require you to understand the schema, i.e.: what the
> columns are up front unless you're just going to store the data without
> schema and, therefore, without ability to search effectively.
>
> As suggested already, you should share more detail if you want good
> advice. Both DBs are excellent. Both do different things in different ways.
>
> Hope this helps,
> Russ
>
> On 05/31/2018 05:49 AM, Sudhakar Ganesan wrote:
>
> Team,
>
>
>
> I need to make a decision on Mongo DB vs Cassandra for loading the csv
> file data and store csv file as well. If any of you did such study in last
> couple of months, please share your analysis or observations.
>
>
>
> Regards,
>
> Sudhakar
>
> Legal Disclaimer :
> The information contained in this message may be privileged and
> confidential.
> It is intended to be read only by the individual or entity to whom it is
> addressed
> or by their designee. If the reader of this message is not the intended
> recipient,
> you are on notice that any distribution of this message, in any form,
> is strictly prohibited. If you have received this message in error,
> please immediately notify the sender and delete or destroy any copy of
> this message!
>
>
>


Re: how to immediately delete tombstones

2018-05-31 Thread Alain RODRIGUEZ
Hello,

It's a very common but somewhat complex topic. We wrote about it 2 years
ago and I really think this post might have answers you are looking for:
http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html

Something that you could try (if you do care ending up with one big sstable
:)) is to enable 'unchecked_tombstone_compaction'. This removes a pre-check
supposed to only trigger tombstone compactions when it is worth it (ie
there is no sstable overlaps) but that over time proved to be inefficient
from a tombstone eviction perspective.

If data can actually be deleted (no overlaps, gc_grace_seconds lowered or
reached, ...) then changing option 'unchecked_tombstone_compaction' to
'true' might do a lot of good in terms of disk space. Be aware that a bunch
of compactions might be triggered and that disk space will start by
increasing before (possibly) reducing after the compactions.

The gc_grace of table was default (10 days), now i set that to 0, although
> many compactions finished but no space reclaimed so far.
>

Be aware that if some deletes did not reach all the replicas, the data will
eventually come back as lowering the gc_grace_seconds, you don't allow
repairs to process the data before tombstones are actually evicted.

Also, by setting the gc_grace_seconds to 0, you also disabled the hints
altogether. gc_grace_seconds should always be equivalent to
'max_hint_windows_in_ms'. My colleague Radovan wrote a post with
more information on this:
http://thelastpickle.com/blog/2018/03/21/hinted-handoff-gc-grace-demystified.html

Good luck with your tombstones, again, those are a bit tricky to handle
sometimes ;-)

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2018-05-31 16:19 GMT+01:00 Nicolas Guyomar :

> Hi,
>
> You need to manually force compaction if you do not care ending up with
> one big sstable (nodetool compact)
>
> On 31 May 2018 at 11:07, onmstester onmstester 
> wrote:
>
>> Hi,
>> I've deleted 50% of my data row by row now disk usage of cassandra data
>> is more than 80%.
>> The gc_grace of table was default (10 days), now i set that to 0,
>> although many compactions finished but no space reclaimed so far.
>> How could i force deletion of tombstones in sstables and reclaim the disk
>> used by deleted rows?
>> I'm using cassandra on a single node.
>>
>> Sent using Zoho Mail 
>>
>>
>>
>


Re: Mongo DB vs Cassandra

2018-05-31 Thread Jonathan Haddad
I haven’t seen any query requirements, which is going to be the thing that
makes Cassandra difficult.

If you can’t define your queries beforehand, cassandra is a no go. If you
just want to store data somewhere, and it’s just CSV, I’d go with a simple
blob store like s3 and pick a DB later when you understand the problem
better.

On Thu, May 31, 2018 at 9:06 AM daemeon reiydelle 
wrote:

> If you are starting with a modest amount of data (e.g. under .25 PB) and
> do not have extremely high availability requirements, then it is easier to
> start with MongoDB, avoiding HA clusters. I would suggest you start with
> MongoDB. Both are great, but C* scales far beyond MongoDB FOR A GIVEN LEVEL
> OF DBA ADMIN AND CONFIG.
>
>
> <==>
> "When I finish a project for a client, I have ... learned their issues
> with life,
> their personal secrets, I have come to care about them.
> Once the project is over, I lose them as if I lost family.
> For the client, however, they’ve just dismissed a service worker." ...
> "Thought on the Gig Economy" by Francine Brevetti
>
>
> *Daemeon C.M. ReiydelleSan Francisco 1.415.501.0198/London 44 020 8144
> 9872/Skype daemeon.c.m.reiydelle*
>
>
> On Thu, May 31, 2018 at 4:49 AM, Sudhakar Ganesan <
> sudhakar.gane...@flex.com.invalid> wrote:
>
>> Team,
>>
>>
>>
>> I need to make a decision on Mongo DB vs Cassandra for loading the csv
>> file data and store csv file as well. If any of you did such study in last
>> couple of months, please share your analysis or observations.
>>
>>
>>
>> Regards,
>>
>> Sudhakar
>> Legal Disclaimer :
>> The information contained in this message may be privileged and
>> confidential.
>> It is intended to be read only by the individual or entity to whom it is
>> addressed
>> or by their designee. If the reader of this message is not the intended
>> recipient,
>> you are on notice that any distribution of this message, in any form,
>> is strictly prohibited. If you have received this message in error,
>> please immediately notify the sender and delete or destroy any copy of
>> this message!
>>
>
> --
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: Mongo DB vs Cassandra

2018-05-31 Thread Jeff Jirsa
277 TB/day seems like the type of task I'd not trust to random mailing list
advice.

Cassandra can do that, but it's nontrivial. MongoDB may be able to do it,
too (not sure). A lot of it will depend on how you're trying to query the
data.



On Thu, May 31, 2018 at 9:00 AM, Sudhakar Ganesan <
sudhakar.gane...@flex.com.invalid> wrote:

> At high level, in the production line, machine will provide the data in
> the form of CSV in every 1 sec to 1 minutes to 1 day ( depending on machine
> type used in the line operations). I need to parse those files and load it
> to DB and build and API layer expose it to downstream systems.
>
>
>
> *Number of files to be processed   13,889,660,134  per day*
>
> *Each file could range from 20 KB to 600MB which will translate into few
> hundred rows to millions of rows.*
>
> *High availability with high write. Read is less compare to write.*
>
> *While extracting the rows, few validation to be performed.*
>
> *Build an API layer on top of the data to be persisted in the DB.*
>
>
>
> Now, tell me what would be the best choice…
>
>
>
> *From:* Russell Bateman [mailto:r...@windofkeltia.com]
> *Sent:* Thursday, May 31, 2018 7:36 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Mongo DB vs Cassandra
>
>
>
> Sudhakar,
>
> MongoDB will accommodate loading CSV without regard to schema while still
> creating identifiable "columns" in the database, but you'll have to predict
> or back-impose some schema later if you're going to create indices for fast
> searching of the data. You can perform searching of data without indexing
> in MongoDB, but it's slower.
>
> Cassandra will require you to understand the schema, i.e.: what the
> columns are up front unless you're just going to store the data without
> schema and, therefore, without ability to search effectively.
>
> As suggested already, you should share more detail if you want good
> advice. Both DBs are excellent. Both do different things in different ways.
>
> Hope this helps,
> Russ
>
> On 05/31/2018 05:49 AM, Sudhakar Ganesan wrote:
>
> Team,
>
>
>
> I need to make a decision on Mongo DB vs Cassandra for loading the csv
> file data and store csv file as well. If any of you did such study in last
> couple of months, please share your analysis or observations.
>
>
>
> Regards,
>
> Sudhakar
>
> Legal Disclaimer :
> The information contained in this message may be privileged and
> confidential.
> It is intended to be read only by the individual or entity to whom it is
> addressed
> or by their designee. If the reader of this message is not the intended
> recipient,
> you are on notice that any distribution of this message, in any form,
> is strictly prohibited. If you have received this message in error,
> please immediately notify the sender and delete or destroy any copy of
> this message!
>
>
>


Re: Mongo DB vs Cassandra

2018-05-31 Thread daemeon reiydelle
If you are starting with a modest amount of data (e.g. under .25 PB) and do
not have extremely high availability requirements, then it is easier to
start with MongoDB, avoiding HA clusters. I would suggest you start with
MongoDB. Both are great, but C* scales far beyond MongoDB FOR A GIVEN LEVEL
OF DBA ADMIN AND CONFIG.


<==>
"When I finish a project for a client, I have ... learned their issues with
life,
their personal secrets, I have come to care about them.
Once the project is over, I lose them as if I lost family.
For the client, however, they’ve just dismissed a service worker." ...
"Thought on the Gig Economy" by Francine Brevetti


*Daemeon C.M. ReiydelleSan Francisco 1.415.501.0198/London 44 020 8144
9872/Skype daemeon.c.m.reiydelle*


On Thu, May 31, 2018 at 4:49 AM, Sudhakar Ganesan <
sudhakar.gane...@flex.com.invalid> wrote:

> Team,
>
>
>
> I need to make a decision on Mongo DB vs Cassandra for loading the csv
> file data and store csv file as well. If any of you did such study in last
> couple of months, please share your analysis or observations.
>
>
>
> Regards,
>
> Sudhakar
> Legal Disclaimer :
> The information contained in this message may be privileged and
> confidential.
> It is intended to be read only by the individual or entity to whom it is
> addressed
> or by their designee. If the reader of this message is not the intended
> recipient,
> you are on notice that any distribution of this message, in any form,
> is strictly prohibited. If you have received this message in error,
> please immediately notify the sender and delete or destroy any copy of
> this message!
>


RE: Mongo DB vs Cassandra

2018-05-31 Thread Sudhakar Ganesan
At high level, in the production line, machine will provide the data in the 
form of CSV in every 1 sec to 1 minutes to 1 day ( depending on machine type 
used in the line operations). I need to parse those files and load it to DB and 
build and API layer expose it to downstream systems.

Number of files to be processed   13,889,660,134  per day
Each file could range from 20 KB to 600MB which will translate into few hundred 
rows to millions of rows.
High availability with high write. Read is less compare to write.
While extracting the rows, few validation to be performed.
Build an API layer on top of the data to be persisted in the DB.

Now, tell me what would be the best choice...

From: Russell Bateman [mailto:r...@windofkeltia.com]
Sent: Thursday, May 31, 2018 7:36 PM
To: user@cassandra.apache.org
Subject: Re: Mongo DB vs Cassandra

Sudhakar,

MongoDB will accommodate loading CSV without regard to schema while still 
creating identifiable "columns" in the database, but you'll have to predict or 
back-impose some schema later if you're going to create indices for fast 
searching of the data. You can perform searching of data without indexing in 
MongoDB, but it's slower.

Cassandra will require you to understand the schema, i.e.: what the columns are 
up front unless you're just going to store the data without schema and, 
therefore, without ability to search effectively.

As suggested already, you should share more detail if you want good advice. 
Both DBs are excellent. Both do different things in different ways.

Hope this helps,
Russ
On 05/31/2018 05:49 AM, Sudhakar Ganesan wrote:
Team,

I need to make a decision on Mongo DB vs Cassandra for loading the csv file 
data and store csv file as well. If any of you did such study in last couple of 
months, please share your analysis or observations.

Regards,
Sudhakar
Legal Disclaimer :
The information contained in this message may be privileged and confidential.
It is intended to be read only by the individual or entity to whom it is 
addressed
or by their designee. If the reader of this message is not the intended 
recipient,
you are on notice that any distribution of this message, in any form,
is strictly prohibited. If you have received this message in error,
please immediately notify the sender and delete or destroy any copy of this 
message!



Re: how to immediately delete tombstones

2018-05-31 Thread Nicolas Guyomar
Hi,

You need to manually force compaction if you do not care ending up with one
big sstable (nodetool compact)

On 31 May 2018 at 11:07, onmstester onmstester  wrote:

> Hi,
> I've deleted 50% of my data row by row now disk usage of cassandra data is
> more than 80%.
> The gc_grace of table was default (10 days), now i set that to 0, although
> many compactions finished but no space reclaimed so far.
> How could i force deletion of tombstones in sstables and reclaim the disk
> used by deleted rows?
> I'm using cassandra on a single node.
>
> Sent using Zoho Mail 
>
>
>


how to immediately delete tombstones

2018-05-31 Thread onmstester onmstester
Hi, 

I've deleted 50% of my data row by row now disk usage of cassandra data is more 
than 80%.

The gc_grace of table was default (10 days), now i set that to 0, although many 
compactions finished but no space reclaimed so far.

How could i force deletion of tombstones in sstables and reclaim the disk used 
by deleted rows?

I'm using cassandra on a single node.
Sent using Zoho Mail







Re: Mongo DB vs Cassandra

2018-05-31 Thread Russell Bateman

Sudhakar,

MongoDB will accommodate loading CSV without regard to schema while 
still creating identifiable "columns" in the database, but you'll have 
to predict or back-impose some schema later if you're going to create 
indices for fast searching of the data. You can perform searching of 
data without indexing in MongoDB, but it's slower.


Cassandra will require you to understand the schema, i.e.: what the 
columns are up front unless you're just going to store the data without 
schema and, therefore, without ability to search effectively.


As suggested already, you should share more detail if you want good 
advice. Both DBs are excellent. Both do different things in different ways.


Hope this helps,
Russ

On 05/31/2018 05:49 AM, Sudhakar Ganesan wrote:


Team,

I need to make a decision on Mongo DB vs Cassandra for loading the csv 
file data and store csv file as well. If any of you did such study in 
last couple of months, please share your analysis or observations.


Regards,

Sudhakar

Legal Disclaimer :
The information contained in this message may be privileged and 
confidential.
It is intended to be read only by the individual or entity to whom it 
is addressed
or by their designee. If the reader of this message is not the 
intended recipient,

you are on notice that any distribution of this message, in any form,
is strictly prohibited. If you have received this message in error,
please immediately notify the sender and delete or destroy any copy of 
this message!




Re: Mongo DB vs Cassandra

2018-05-31 Thread Joseph Arriola
Hi Sudhakar!

each one have a different goals, which means that they are complementary.
Could you share more detail of the use case to give you a better advice?

El El jue, 31 de may. de 2018 a las 5:50 a. m., Sudhakar Ganesan
 escribió:

> Team,
>
>
>
> I need to make a decision on Mongo DB vs Cassandra for loading the csv
> file data and store csv file as well. If any of you did such study in last
> couple of months, please share your analysis or observations.
>
>
>
> Regards,
>
> Sudhakar
> Legal Disclaimer :
> The information contained in this message may be privileged and
> confidential.
> It is intended to be read only by the individual or entity to whom it is
> addressed
> or by their designee. If the reader of this message is not the intended
> recipient,
> you are on notice that any distribution of this message, in any form,
> is strictly prohibited. If you have received this message in error,
> please immediately notify the sender and delete or destroy any copy of
> this message!
>


Mongo DB vs Cassandra

2018-05-31 Thread Sudhakar Ganesan
Team,

I need to make a decision on Mongo DB vs Cassandra for loading the csv file 
data and store csv file as well. If any of you did such study in last couple of 
months, please share your analysis or observations.

Regards,
Sudhakar

Legal Disclaimer :
The information contained in this message may be privileged and confidential. 
It is intended to be read only by the individual or entity to whom it is 
addressed 
or by their designee. If the reader of this message is not the intended 
recipient, 
you are on notice that any distribution of this message, in any form, 
is strictly prohibited. If you have received this message in error, 
please immediately notify the sender and delete or destroy any copy of this 
message!


Re: Certified Cassandra for Enterprise use

2018-05-31 Thread Rahul Singh
To be as objective as possible :

Product vendors
Datastax
Stratio

Infrastructure/ Database as a Service
Instaclustr
CosmosDB on Azure.

Container Orchestration
Mesosphere (DCOS creator) has limited support of “certified” Cassandra and DSE 
containers on Mesos


Disclosure : our firm is a DataStax services partner.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation
On May 29, 2018, 4:01 AM -0400, Ben Slater , wrote:
> Hi Pranay
>
> We (Instaclustr) provide enterprise support for Cassandra 
> (https://www.instaclustr.com/services/cassandra-support/) which may cover 
> what you are looking for.
>
> Please get in touch direct if you would like to discuss.
>
> Cheers
> Ben
>
> > On Tue, 29 May 2018 at 10:11 Pranay akula  
> > wrote:
> > > Is there any third party who provides security patches/releases for 
> > > Apache cassandra
> > >
> > > For Enterprise use is there any third party who provides certified Apache 
> > > cassandra packages ??
> > >
> > > Thanks
> > > Pranay
> --
> Ben Slater
> Chief Product Officer
>
>
> Read our latest technical blog posts here.
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia) 
> and Instaclustr Inc (USA).
> This email and any attachments may contain confidential and legally 
> privileged information.  If you are not the intended recipient, do not copy 
> or disclose its content, but please reply to this email immediately and 
> highlight the error to the sender and then immediately delete the message.