Re: [EXTERNAL] Re: COPY command with where condition

2020-01-20 Thread Jean Carlo
Hello

Nobody has mentioned but you can use spark cassandra connector also.
Preferably if your data set is so big that a simple copy to csv cannot
handle it

Saludos

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay


On Fri, Jan 17, 2020 at 8:11 PM Durity, Sean R 
wrote:

> sstablekeys (in the tools directory?) can extract the actual keys from
> your sstables. You have to run it on each node and then combine and de-dupe
> the final results, but I have used this technique with a query generator to
> extract data more efficiently.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Chris Splinter 
> *Sent:* Friday, January 17, 2020 1:47 PM
> *To:* adrien ruffie 
> *Cc:* user@cassandra.apache.org; Erick Ramirez 
> *Subject:* [EXTERNAL] Re: COPY command with where condition
>
>
>
> Do you know your partition keys?
>
>
>
> One option could be to enumerate that list of partition keys in separate
> cmds to make the individual operations less expensive for the cluster.
>
>
>
> For example:
>
> Say your partition key column is called id and the ids in your database
> are [1,2,3]
>
>
>
> You could do
>
> ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
> FROM probe_sensors WHERE id = 1 AND localisation_id = 208812" -url
> /home/dump
>
> ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
> FROM probe_sensors WHERE id = 2 AND localisation_id = 208812" -url
> /home/dump
>
> ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
> FROM probe_sensors WHERE id = 3 AND localisation_id = 208812" -url
> /home/dump
>
>
>
>
>
> Does that option work for you?
>
>
>
>
>
>
>
> On Fri, Jan 17, 2020 at 12:17 PM adrien ruffie 
> wrote:
>
> I don't really know for the moment in production environment, but for
> developpment environment the table contains more than 10.000.000 rows.
>
> But we need just a sub dataset of this table not the entirety ...
> --
>
> *De :* Chris Splinter 
> *Envoyé :* vendredi 17 janvier 2020 17:40
> *À :* adrien ruffie 
> *Cc :* user@cassandra.apache.org ; Erick
> Ramirez 
> *Objet :* Re: COPY command with where condition
>
>
>
> What you are seeing there is a standard read timeout, how many rows do you
> expect back from that query?
>
>
>
> On Fri, Jan 17, 2020 at 9:50 AM adrien ruffie 
> wrote:
>
> Thank you very much,
>
>
>
>  so I do this request with for example -->
>
>
>
> ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
> FROM probe_sensors WHERE localisation_id = 208812 ALLOW FILTERING" -url
> /home/dump
>
>
>
>
>
> But I get the following error
>
> com.datastax.dsbulk.executor.api.exception.BulkExecutionException:
> Statement execution failed: SELECT * FROM crt_sensors WHERE site_id =
> 208812 ALLOW FILTERING (Cassandra timeout during read query at consistency
> LOCAL_ONE (1 responses were required but only 0 replica responded))
>
>
>
> but I configured my driver with following driver.conf, but nothing work
> correctly. Do you know what is the problem ?
>
>
>
> datastax-java-driver {
>
> basic {
>
>
>
>
>
> contact-points = ["data1com:9042","data2.com:9042 [data2.com]
> <https://urldefense.com/v3/__http:/data2.com:9042__;!!M-nmYVHPHQ!aPA4KExKulLx_PrHwhUQwPy881v1sjBkj35R1lAx2EUxSkRCLwmtNon0SMW0XbLKH7jCV5U$>
> "]
>
>
>
> request {
>
> timeout = "200"
>
> consistency = "LOCAL_ONE"
>
>
>
> }
>
> }
>
> advanced {
>
>
>
> auth-provider {
>
> class = PlainTextAuthProvider
>
> username = "superuser"
>
> password = "mypass"
>
>
>
> }
>
> }
>
> }
> --
>
> *De :* Chris Splinter 
> *Envoyé :* vendredi 17 janvier 2020 16:17
> *À :* user@cassandra.apache.org 
> *Cc :* Erick Ramirez 
> *Objet :* Re: COPY command with where condition
>
>
>
> DSBulk has an option that lets you specify the query ( including a WHERE
> clause )
>
>
>
> See Example 19 in this blog post for details: 
> https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
> [datastax.com]
> <https://urldefense.com/v3/__https:/www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading__;!!M-nmYVHPHQ!aPA4KExKulLx_PrHwhUQwPy881v1sjBkj35R1lAx2EUxSkRCLwmtNon0SMW0XbLKBUuw2Cc$>
>
>
>
> On Fri, Jan

RE: [EXTERNAL] Re: COPY command with where condition

2020-01-17 Thread Durity, Sean R
sstablekeys (in the tools directory?) can extract the actual keys from your 
sstables. You have to run it on each node and then combine and de-dupe the 
final results, but I have used this technique with a query generator to extract 
data more efficiently.


Sean Durity

From: Chris Splinter 
Sent: Friday, January 17, 2020 1:47 PM
To: adrien ruffie 
Cc: user@cassandra.apache.org; Erick Ramirez 
Subject: [EXTERNAL] Re: COPY command with where condition

Do you know your partition keys?

One option could be to enumerate that list of partition keys in separate cmds 
to make the individual operations less expensive for the cluster.

For example:
Say your partition key column is called id and the ids in your database are 
[1,2,3]

You could do
./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT * FROM 
probe_sensors WHERE id = 1 AND localisation_id = 208812" -url /home/dump
./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT * FROM 
probe_sensors WHERE id = 2 AND localisation_id = 208812" -url /home/dump
./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT * FROM 
probe_sensors WHERE id = 3 AND localisation_id = 208812" -url /home/dump


Does that option work for you?



On Fri, Jan 17, 2020 at 12:17 PM adrien ruffie 
mailto:adriennolar...@hotmail.fr>> wrote:
I don't really know for the moment in production environment, but for 
developpment environment the table contains more than 10.000.000 rows.
But we need just a sub dataset of this table not the entirety ...

De : Chris Splinter 
mailto:chris.splinter...@gmail.com>>
Envoyé : vendredi 17 janvier 2020 17:40
À : adrien ruffie mailto:adriennolar...@hotmail.fr>>
Cc : user@cassandra.apache.org<mailto:user@cassandra.apache.org> 
mailto:user@cassandra.apache.org>>; Erick Ramirez 
mailto:flightc...@gmail.com>>
Objet : Re: COPY command with where condition

What you are seeing there is a standard read timeout, how many rows do you 
expect back from that query?

On Fri, Jan 17, 2020 at 9:50 AM adrien ruffie 
mailto:adriennolar...@hotmail.fr>> wrote:
Thank you very much,

 so I do this request with for example -->

./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT * FROM 
probe_sensors WHERE localisation_id = 208812 ALLOW FILTERING" -url /home/dump


But I get the following error
com.datastax.dsbulk.executor.api.exception.BulkExecutionException: Statement 
execution failed: SELECT * FROM crt_sensors WHERE site_id = 208812 ALLOW 
FILTERING (Cassandra timeout during read query at consistency LOCAL_ONE (1 
responses were required but only 0 replica responded))

but I configured my driver with following driver.conf, but nothing work 
correctly. Do you know what is the problem ?

datastax-java-driver {
basic {


contact-points = ["data1com:9042","data2.com:9042 
[data2.com]<https://urldefense.com/v3/__http:/data2.com:9042__;!!M-nmYVHPHQ!aPA4KExKulLx_PrHwhUQwPy881v1sjBkj35R1lAx2EUxSkRCLwmtNon0SMW0XbLKH7jCV5U$>"]

request {
timeout = "200"
consistency = "LOCAL_ONE"

}
}
advanced {

auth-provider {
class = PlainTextAuthProvider
username = "superuser"
password = "mypass"

}
}
}

De : Chris Splinter 
mailto:chris.splinter...@gmail.com>>
Envoyé : vendredi 17 janvier 2020 16:17
À : user@cassandra.apache.org<mailto:user@cassandra.apache.org> 
mailto:user@cassandra.apache.org>>
Cc : Erick Ramirez mailto:flightc...@gmail.com>>
Objet : Re: COPY command with where condition

DSBulk has an option that lets you specify the query ( including a WHERE clause 
)

See Example 19 in this blog post for details: 
https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading 
[datastax.com]<https://urldefense.com/v3/__https:/www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading__;!!M-nmYVHPHQ!aPA4KExKulLx_PrHwhUQwPy881v1sjBkj35R1lAx2EUxSkRCLwmtNon0SMW0XbLKBUuw2Cc$>

On Fri, Jan 17, 2020 at 7:34 AM Jean Tremblay 
mailto:jean.tremb...@zen-innovations.com>> 
wrote:
Did you think about using a Materialised View to generate what you want to 
keep, and then use DSBulk to extract the data?


On 17 Jan 2020, at 14:30 , adrien ruffie 
mailto:adriennolar...@hotmail.fr>> wrote:

Sorry I come back to a quick question about the bulk loader ...

https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader 
[datastax.com]<https://urldefense.com/v3/__https:/www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader__;!!M-nmYVHPHQ!aPA4KExKulLx_PrHwhUQwPy881v1sjBkj35R1lAx2EUxSkRCLwmtNon0SMW0XbLKLr1rFjk$>

I read this : "Operations such as converting strings to lowercase, arithmetic 
on input columns, or filtering out row