Re: Mechanism to Bulk Export from Cassandra on daily Basis

2020-02-19 Thread Amanda Moran
HI there-

DataStax recently released their bulkloader into the OSS community.

I would take a look and at least try it out:
https://docs.datastax.com/en/dsbulk/doc/dsbulk/dsbulkAbout.html

Good luck!

Amanda

On Wed, Feb 19, 2020 at 12:10 PM JOHN, BIBIN  wrote:

> Thanks for the response. We need to export into a flat file and send to
> another analytical application. There are 137 tables and 30 of them are
> have 300M+ records. So “COPY TO” taking lot of time.
>
>
>
> Thank you
>
> Bibin John
>
>
>
> *From:* Aakash Pandhi 
> *Sent:* Wednesday, February 19, 2020 12:51 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Mechanism to Bulk Export from Cassandra on daily Basis
>
>
>
> John,
>
>
>
> Greetings,
>
>
>
> Requirement is to just export data from table and stage it somewhere? OR
> export it and load them in another cluster/table?
>
>
>
> sstableloader is a utility which can help you as it is designed for bulk
> loading.
>
> Sincerely,
>
> Aakash Pandhi
>
>
>
>
>
> On Wednesday, February 19, 2020, 10:13:32 AM PST, JOHN, BIBIN <
> bj9...@att.com> wrote:
>
>
>
>
>
> Team,
>
> We have a requirement to bulk export data from Cassandra on daily basis?
> Table contain close to 600M records and cluster is having 12 nodes. What is
> the best approach to do this?
>
>
>
>
>
> Thanks
>
> Bibin John
>


Re: Apache vs Datastax cassandra

2020-02-04 Thread Amanda Moran
Erik is right ... the folks at DataStax would be happy to help you and probably 
don’t feel comfortable posting here :) 

With that said, I used to work there but I don’t anymore so I feel comfortable 
sending a bit of information your way. 

Here is a very high level overview slide deck of what DataStax does (to the 
theme of Friends) that I presented last year at a DataStax meetup. Might help 
you shape your conversation with the folks there. 

https://github.com/amandamoran/dataStaxMeetup/blob/master/slides/meetMyFriendDataStax.pdf

Thanks!

Amanda 

Sent from my iPhone

> On Feb 3, 2020, at 5:20 PM, Erick Ramirez  wrote:
> 
> 
> Adarsh, a very *friendly* note that anyone is more than welcome to ask 
> questions -- in fact as a group it's encouraged -- but a *gentle reminder* 
> that this mailing list is for open-source Apache Cassandra. By all means, 
> feel free to respond and not saying at all that it's not allowed (I'm just 
> another user after all, not affiliated with the ASF) though it might be more 
> appropriate to post your question on community.datastax.com. Cheers!
> 
>> On Mon, Feb 3, 2020 at 8:30 PM Adarsh Kumar  wrote:
>> Hello All,
>> 
>> We have a product that uses Postgres/Cassandra as datastore. We user both 
>> Apache and DataStax Cassandra depending on the client's requirements. But 
>> never got the chance to explore what exact difference between these two. 
>> Apart from Opscenter, DSEGraph, want to know more from Cassandra perspective.
>> 
>> Also please provide link to any blog if available.
>> 
>> Thanks in advance.
>> 
>> Thanks & Regards,
>> Adarsh K


Re: [Announce] DataStax Support for Apache Cassandra, New Tools

2019-12-17 Thread Amanda Moran
I highly recommend folks checkout *DataStax Bulk Loader
 (which maybe needs a new name
now that is supports both DSE and C*). It's a very advanced tool with a lot
of much needed functionality. *

On Tue, Dec 17, 2019 at 1:02 PM Jonathan Ellis  wrote:

> *Hi all,*
>
> * Today DataStax is pleased to announce Luna
> : support for Apache
> Cassandra versions 2.1, 2.2, 3.0, and 3.11. The short version is that with
> Luna, we’re making our expertise available to Apache Cassandra users as a
> subscription-based support plan with public pricing that you can buy
> directly through our website. The full announcement is here
> .
>  Additionally,
> as part of our ongoing commitment to Cassandra, we’re also announcing the
> availability of DataStax Bulk Loader
>  and DataStax Apache Kafka
> Connector  as free downloads, making
> loading and unloading data from Cassandra faster and easier.  Details of
> this release are here
> . *
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>


Re: Cassandra GUI that supports cqlsh through Bastion

2019-09-18 Thread Amanda Moran
Not sure exactly what you are looking for but it's pretty easy to use
Jupyter notebooks with Cassandra.

I set it up not that long ago using this:
https://github.com/slowenthal/cql_kernel and it worked great.

On Wed, Sep 18, 2019 at 3:08 PM Bhavesh Prajapati 
wrote:

> Hi,
>
>
>
> I am looking for Cassandra GUI that supports cqlsh connection to Cassandra
> node through bastion/jump host using ssh key.
>
>
>
> Thanks,
>
> Bhavesh
>


Re: [EXTERNAL] Re: loading big amount of data to Cassandra

2019-08-06 Thread Amanda Moran
With DataStax bulkloader you can only export from a Cassandra table but not 
import into Cassandra (only load into DSE cluster). 

And +1 on the confusing name of batches ... yes it’s for writes but not for 
loading data. 

Amanda 

> On Aug 5, 2019, at 8:14 AM, Durity, Sean R  
> wrote:
> 
> DataStax has a very fast bulk load tool - dsebulk. Not sure if it is 
> available for open source or not. In my experience so far, I am very 
> impressed with it.
> 
> 
> 
> Sean Durity – Staff Systems Engineer, Cassandra
> 
> -Original Message-
> From: p...@xvalheru.org 
> Sent: Saturday, August 3, 2019 6:06 AM
> To: user@cassandra.apache.org
> Cc: Dimo Velev 
> Subject: [EXTERNAL] Re: loading big amount of data to Cassandra
> 
> Thanks to all,
> 
> I'll try the SSTables.
> 
> Thanks
> 
> Pat
> 
>> On 2019-08-03 09:54, Dimo Velev wrote:
>> Check out the CQLSSTableWriter java class -
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_cassandra_blob_trunk_src_java_org_apache_cassandra_io_sstable_CQLSSTableWriter.java=DwIDaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=0F8VMU_BKNwicZFDQ0Nx54JvvS3MHT92_W1RRwF3deA=F43aPz7NPfAfs5c_oRJQvUiTMJjDmpB_BXAHKhPfW2A=
>> . You use it to generate sstables - you need to write a small program
>> for that. You can then stream them over the network using the
>> sstableloader (either use the utility or use the underlying classes to
>> embed it in your program).
>> 
>>> On 3. Aug 2019, at 07:17, Ayub M  wrote:
>>> 
>>> Dimo, how do you generate sstables? Do you mean load data locally on
>>> a cassandra node and use sstableloader?
>>> 
>>> On Fri, Aug 2, 2019, 5:48 PM Dimo Velev 
>>> wrote:
>>> 
 Hi,
 
 Batches will actually slow down the process because they mean a
 different thing in C* - as you read they are just grouping changes
 together that you want executed atomically.
 
 Cassandra does not really have indices so that is different than a
 relational DB. However, after writing stuff to Cassandra it
 generates many smallish partitions of the data. These are then
 joined in the background together to improve read performance.
 
 You have two options from my experience:
 
 Option 1: use normal CQL api in async mode. This will create a
 high CPU load on your cluster. Depending on whether that is fine
 for you that might be the easiest solution.
 
 Option 2: generate sstables locally and use the sstableloader to
 upload them into the cluster. The streaming does not generate high
 cpu load so it is a viable option for clusters with other
 operational load.
 
 Option 2 scales with the number of cores of the machine generating
 the sstables. If you can split your data you can generate sstables
 on multiple machines. In contrast, option 1 scales with your
 cluster. If you have a large cluster that is idling, it would be
 better to use option 1.
 
 With both options I was able to write at about 50-100K rows / sec
 on my laptop and local Cassandra. The speed heavily depends on the
 size of your rows.
 
 Back to your question — I guess option2 is similar to what you
 are used to from tools like sqlloader for relational DBMSes
 
 I had a requirement of loading a few 100 mio rows per day into an
 operational cluster so I went with option 2 to offload the cpu
 load to reduce impact on the reading side during the loads.
 
 Cheers,
 Dimo
 
 Sent from my iPad
 
> On 2. Aug 2019, at 18:59, p...@xvalheru.org wrote:
> 
> Hi,
> 
> I need to upload to Cassandra about 7 billions of records. What
 is the best setup of Cassandra for this task? Will usage of batch
 speeds up the upload (I've read somewhere that batch in Cassandra
 is dedicated to atomicity not to speeding up communication)? How
 Cassandra internally works related to indexing? In SQL databases
 when uploading such amount of data is suggested to turn off
 indexing and then turn on. Is something simmillar possible in
 Cassandra?
> 
> Thanks for all suggestions.
> 
> Pat
> 
> 
> Freehosting PIPNI - 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pipni.cz_=DwIDaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=0F8VMU_BKNwicZFDQ0Nx54JvvS3MHT92_W1RRwF3deA=nccgCDZwHe3qri11l3VV1if5GR1iqcWR5gjf6-J1C5U=
> 
> 
> 
 
>>> 
>> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 
 
 
>>> 
>> -
 To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: user-h...@cassandra.apache.org
>> 
>> 

Re: How do u setup networking for Opening Solr Web Interface when on cloud?

2019-04-02 Thread Amanda Moran
Hi there Krish-

I want to +1 what Rahul said about reaching out to DataStax. Please submit
a support ticket, and DataStax support can help you with this.

Thanks!

Amanda

On Mon, Apr 1, 2019 at 3:47 PM Krish Donald  wrote:

> I have searched on internet but did not get any link which worked for me.
>
> Even on
> https://s3.amazonaws.com/quickstart-reference/datastax/latest/doc/datastax-enterprise-on-the-aws-cloud.pdf
> it is mentioned to use SSH tunneling .
>
> "DSE nodes have no public IP addresses. Access to the web consoles for
> Solr or Spark can be established by using an SSH tunnel. For example, you
> can access the Solr console from http://NODE_IP:8983/solr/. You can bind
> to a local port with a command like the following (replacing the key and IP
> values for those of your cluster): ssh -v -i $KEY_FILE -L
> 8983:$NODE_IP:8983 ubuntu@$OPSC_PUBLIC_IP -N The Solr console is then
> accessible at http://127.0.0.1:8983/solr/. When you’re prompted to log
> in, enter the user name cassandra and the password you chose. "
>
> But i am not looking for SSH tunneling option.
>
> I tried to follow below link as well:
>
> https://forums.aws.amazon.com/thread.jspa?threadID=31406
>
> But DSE nodes have no public IP addresses so this also did not work.
>
> Thanks
>
>
>
> On Mon, Apr 1, 2019 at 12:32 PM Rahul Singh 
> wrote:
>
>> This is probably not a question for this community... but rather for
>> Datastax support or the Datastax Academy slack group. More specifically
>> this is a "how to expose solr securely" question which is amply answered
>> well on the interwebs if you look for it on Google.
>>
>>
>> rahul.xavier.si...@gmail.com
>>
>> http://cassandra.link
>>
>> I'm speaking at #DataStaxAccelerate, the world’s premiere
>> #ApacheCassandra conference, and I want to see you there! Use my code
>> Singh50 for 50% off your registration. www.datastax.com/accelerate
>>
>>
>> On Mon, Apr 1, 2019 at 12:19 PM Krish Donald 
>> wrote:
>>
>>> Hi,
>>>
>>> We have DSE cassandra cluster running on AWS.
>>> Now we have requirement to enable Solr and Spark on the cluster.
>>> We have cassandra on private data subnet which has connectivity to app
>>> layer.
>>> From cassandra , we cant open direct Solr Web interface.
>>> We tried using SSH tunneling and it is working but we cant give SSH
>>> tunneling option to developers.
>>>
>>> We would like to create a Load Balancer  and put the cassandra nodes
>>> under that load balancer but the question here is , what health check i
>>> need to give for load balancer so that it can open the Solr Web UI ?
>>>
>>> My solution might not be perfect, please suggest any other solution if
>>> you have ?
>>>
>>> Thanks
>>>
>>>


Re: Datastax Java Driver compatibility

2019-01-22 Thread Amanda Moran
Hi there-

I checked with the team here (at DataStax) and this should work. Any reason
you need to stick with Java Driver 3.2, there is a 3.6 release.

Thanks!

Amanda

On Tue, Jan 22, 2019 at 8:45 AM Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> Hello,
>
> I am looking for Datastax Driver compatibility vs apache cassandra 3.11.3
> version.
> However the doc doesn't talk about the 3.11 version.
>
> https://docs.datastax.com/en/driver-matrix/doc/driver_matrix/javaDrivers.html
>
> Can someone please confirm if the Datastax Java Driver 3.2.0 version work
> with 3.11.3 version of apache cassandra?
> Thanks
>