Re: Query timed out after PT2M

2022-02-08 Thread Joe Obernberger
 it looks like "PT2M" may refer to a timeout value
that could be set by your Spark job's initialization of the
client. I don't see a string matching this in the Cassandra
codebase itself, but I do see that this is parseable as a
Duration.

```
jshell> java.time.Duration.parse("PT2M").getSeconds()
$7 ==> 120
```

The server-side log you see is likely an indicator of the
timeout from the server's perspective. You might consider
checking lots from the replicas for dropped reads, query
aborts due to scanning more tombstones than the configured
max, or other conditions indicating overload/inability to
serve a response.

If you're running a Spark job, I'd recommend using the
DataStax Spark Cassandra Connector which distributes your
query to executors addressing slices of the token range which
will land on replica sets, avoiding the scatter-gather
behavior that can occur if using the Java driver alone.

Cheers,

– Scott



On Feb 3, 2022, at 11:42 AM, Joe Obernberger

<mailto:joseph.obernber...@gmail.com> wrote:


    Hi all - using a Cassandra 4.0.1 and a spark job running
against a large
table (~8 billion rows) and I'm getting this error on the
client side:
Query timed out after PT2M

On the server side I see a lot of messages like:
DEBUG [Native-Transport-Requests-39] 2022-02-03 14:39:56,647
ReadCallback.java:119 - Timed out; received 0 of 1 responses

The same code works on another table in the same Cassandra
cluster that
is about 300 million rows and completes in about 2 minutes. 
The cluster
is 13 nodes.

I can't find what PT2M means. Perhaps the table needs a
repair? Other
ideas?
Thank you!

-Joe




<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient>
Virus-free. www.avg.com

<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient>


<#m_4964341664985366658_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Re: Query timed out after PT2M

2022-02-08 Thread Joe Obernberger
ee is likely an indicator of the
timeout from the server's perspective. You might consider
checking lots from the replicas for dropped reads, query
aborts due to scanning more tombstones than the configured
max, or other conditions indicating overload/inability to
serve a response.

If you're running a Spark job, I'd recommend using the
DataStax Spark Cassandra Connector which distributes your
query to executors addressing slices of the token range which
will land on replica sets, avoiding the scatter-gather
behavior that can occur if using the Java driver alone.

Cheers,

– Scott



On Feb 3, 2022, at 11:42 AM, Joe Obernberger

<mailto:joseph.obernber...@gmail.com> wrote:


    Hi all - using a Cassandra 4.0.1 and a spark job running
against a large
table (~8 billion rows) and I'm getting this error on the
client side:
Query timed out after PT2M

On the server side I see a lot of messages like:
DEBUG [Native-Transport-Requests-39] 2022-02-03 14:39:56,647
ReadCallback.java:119 - Timed out; received 0 of 1 responses

The same code works on another table in the same Cassandra
cluster that
is about 300 million rows and completes in about 2 minutes. 
The cluster
is 13 nodes.

I can't find what PT2M means.  Perhaps the table needs a
repair? Other
ideas?
Thank you!

-Joe




<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient>
Virus-free. www.avg.com

<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient>


<#m_4964341664985366658_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Re: Query timed out after PT2M

2022-02-07 Thread Joe Obernberger
ortStage$2(DAGScheduler.scala:2403)
    at

org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2402)
    at
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2402)
    at

org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1160)
    at

org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1160)
    at scala.Option.foreach(Option.scala:407)
    at

org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1160)
    at

org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2642)
    at

org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2584)
    at

org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2573)
    at
org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
Caused by:
com.datastax.oss.driver.api.core.DriverTimeoutException: Query
timed out after PT16M
    at

com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.lambda$scheduleTimeout$1(CqlRequestHandler.java:206)
    at

com.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
    at

com.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
    at

com.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
    at

com.datastax.oss.driver.shaded.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

-Joe

On 2/3/2022 3:30 PM, Joe Obernberger wrote:


I did find this:

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md

And "spark.cassandra.read.timeoutMS" is set to 12.

Running a test now, and I think that is it.  Thank you Scott.

-Joe

On 2/3/2022 3:19 PM, Joe Obernberger wrote:


Thank you Scott!
I am using the spark cassandra connector.  Code:

SparkSession spark = SparkSession
    .builder()
    .appName("SparkCassandraApp")
.config("spark.cassandra.connection.host", "chaos")
.config("spark.cassandra.connection.port", "9042")
    .master("spark://aether.querymasters.com:8181
<http://aether.querymasters.com:8181>")
    .getOrCreate();

Would I set PT2M in there?  Like .config("pt2m","300") ?
I'm not familiar with jshell, so I'm not sure where you're
getting that duration from.

Right now, I'm just doing a count:
Dataset dataset =
spark.read().format("org.apache.spark.sql.cassandra")
    .options(new HashMap() {
    {
    put("keyspace", "doc");
    put("table", "doc");
    }
    }).load();

dataset.count();


Thank you!

-Joe

On 2/3/2022 3:01 PM, C. Scott Andreas wrote:

Hi Joe, it looks like "PT2M" may refer to a timeout value that
could be set by your Spark job's initialization of the client.
I don't see a string matching this in the Cassandra codebase
itself, but I do see that this is parseable as a Duration.

```
jshell> java.time.Duration.parse("PT2M").getSeconds()
$7 ==> 120
```

The server-side log you see is likely an indicator of the
timeout from the server's perspective. You might consider
checking lots from the replicas for dropped reads, query aborts
due to scanning more tombstones than the configured max, or
other conditions indicating overload/inability to serve a response.

If you're running a Spark job, I'd recommend using the DataStax
Spark Cassandra Connector which distributes your query to
executors addressing slices of the token range which will land
on replica sets, avoiding the scatter-gather behavior that can
occur if using the Java driver alone.

Cheers,

– Scott



On Feb 3, 2022, at 11:42 AM, Joe Obernberger

<mailto:joseph.obernber...@gmail.com> wrote:


    Hi all - using a Cassandra 4.0.1 and a spark job running
against a large
table (~8 billion rows) and I'm getting this error on the
client side:
Query timed out after PT2M

On t

Re: Query timed out after PT2M

2022-02-04 Thread Joe Obernberger
etFailed$1(DAGScheduler.scala:1160)
    at

org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1160)
    at scala.Option.foreach(Option.scala:407)
    at

org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1160)
    at

org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2642)
    at

org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2584)
    at

org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2573)
    at
org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
Caused by:
com.datastax.oss.driver.api.core.DriverTimeoutException: Query
timed out after PT16M
    at

com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.lambda$scheduleTimeout$1(CqlRequestHandler.java:206)
    at

com.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
    at

com.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
    at

com.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
    at

com.datastax.oss.driver.shaded.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

-Joe

On 2/3/2022 3:30 PM, Joe Obernberger wrote:


I did find this:

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md

And "spark.cassandra.read.timeoutMS" is set to 12.

Running a test now, and I think that is it.  Thank you Scott.

-Joe

On 2/3/2022 3:19 PM, Joe Obernberger wrote:


Thank you Scott!
I am using the spark cassandra connector.  Code:

SparkSession spark = SparkSession
    .builder()
    .appName("SparkCassandraApp")
.config("spark.cassandra.connection.host", "chaos")
.config("spark.cassandra.connection.port", "9042")
    .master("spark://aether.querymasters.com:8181
<http://aether.querymasters.com:8181>")
    .getOrCreate();

Would I set PT2M in there?  Like .config("pt2m","300") ?
I'm not familiar with jshell, so I'm not sure where you're
getting that duration from.

Right now, I'm just doing a count:
Dataset dataset =
spark.read().format("org.apache.spark.sql.cassandra")
    .options(new HashMap() {
    {
    put("keyspace", "doc");
    put("table", "doc");
    }
    }).load();

dataset.count();


Thank you!

-Joe

On 2/3/2022 3:01 PM, C. Scott Andreas wrote:

Hi Joe, it looks like "PT2M" may refer to a timeout value that
could be set by your Spark job's initialization of the client.
I don't see a string matching this in the Cassandra codebase
itself, but I do see that this is parseable as a Duration.

```
jshell> java.time.Duration.parse("PT2M").getSeconds()
$7 ==> 120
```

The server-side log you see is likely an indicator of the
timeout from the server's perspective. You might consider
checking lots from the replicas for dropped reads, query
aborts due to scanning more tombstones than the configured
max, or other conditions indicating overload/inability to
serve a response.

If you're running a Spark job, I'd recommend using the
DataStax Spark Cassandra Connector which distributes your
query to executors addressing slices of the token range which
will land on replica sets, avoiding the scatter-gather
behavior that can occur if using the Java driver alone.

Cheers,

– Scott



On Feb 3, 2022, at 11:42 AM, Joe Obernberger

<mailto:joseph.obernber...@gmail.com> wrote:


    Hi all - using a Cassandra 4.0.1 and a spark job running
against a large
table (~8 billion rows) and I'm getting this error on the
client side:
Query timed out after PT2M

On the server side I see a lot of messages like:
DEBUG [Native-Transport-Requests-39] 2022-02-03 14:39:56,647
ReadCallback.java:119 - Timed out; received 0 of 1 responses

The same code works on another table in the same Cassandra
cluster that
is about 300 million rows and completes in about 2 minutes. 
The cluster
is 13 nodes.

I can't find what PT2M means.  Perhaps the table needs a
repair? Other
ideas?
Thank you!

-Joe




<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient>
 

Re: Query timed out after PT2M

2022-02-04 Thread Joe Obernberger
scala:2642)
    at

org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2584)
    at

org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2573)
    at
org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
Caused by:
com.datastax.oss.driver.api.core.DriverTimeoutException: Query
timed out after PT16M
    at

com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.lambda$scheduleTimeout$1(CqlRequestHandler.java:206)
    at

com.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
    at

com.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
    at

com.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
    at

com.datastax.oss.driver.shaded.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

-Joe

On 2/3/2022 3:30 PM, Joe Obernberger wrote:


I did find this:

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md

And "spark.cassandra.read.timeoutMS" is set to 12.

Running a test now, and I think that is it.  Thank you Scott.

-Joe

On 2/3/2022 3:19 PM, Joe Obernberger wrote:


Thank you Scott!
I am using the spark cassandra connector.  Code:

SparkSession spark = SparkSession
    .builder()
    .appName("SparkCassandraApp")
.config("spark.cassandra.connection.host", "chaos")
.config("spark.cassandra.connection.port", "9042")
    .master("spark://aether.querymasters.com:8181
<http://aether.querymasters.com:8181>")
    .getOrCreate();

Would I set PT2M in there?  Like .config("pt2m","300") ?
I'm not familiar with jshell, so I'm not sure where you're
getting that duration from.

Right now, I'm just doing a count:
Dataset dataset =
spark.read().format("org.apache.spark.sql.cassandra")
    .options(new HashMap() {
    {
    put("keyspace", "doc");
    put("table", "doc");
    }
    }).load();

dataset.count();


Thank you!

-Joe

On 2/3/2022 3:01 PM, C. Scott Andreas wrote:

Hi Joe, it looks like "PT2M" may refer to a timeout value that
could be set by your Spark job's initialization of the client.
I don't see a string matching this in the Cassandra codebase
itself, but I do see that this is parseable as a Duration.

```
jshell> java.time.Duration.parse("PT2M").getSeconds()
$7 ==> 120
```

The server-side log you see is likely an indicator of the
timeout from the server's perspective. You might consider
checking lots from the replicas for dropped reads, query aborts
due to scanning more tombstones than the configured max, or
other conditions indicating overload/inability to serve a response.

If you're running a Spark job, I'd recommend using the DataStax
Spark Cassandra Connector which distributes your query to
executors addressing slices of the token range which will land
on replica sets, avoiding the scatter-gather behavior that can
occur if using the Java driver alone.

Cheers,

– Scott



On Feb 3, 2022, at 11:42 AM, Joe Obernberger

<mailto:joseph.obernber...@gmail.com> wrote:


    Hi all - using a Cassandra 4.0.1 and a spark job running
against a large
table (~8 billion rows) and I'm getting this error on the
client side:
Query timed out after PT2M

On the server side I see a lot of messages like:
DEBUG [Native-Transport-Requests-39] 2022-02-03 14:39:56,647
ReadCallback.java:119 - Timed out; received 0 of 1 responses

The same code works on another table in the same Cassandra
cluster that
is about 300 million rows and completes in about 2 minutes. 
The cluster
is 13 nodes.

I can't find what PT2M means.  Perhaps the table needs a
repair? Other
ideas?
Thank you!

-Joe




<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient>
Virus-free. www.avg.com

<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient>


<#m_4964341664985366658_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Re: Query timed out after PT2M

2022-02-03 Thread manish khandelwal
k.sql.cassandra")
> .options(new HashMap() {
> {
> put("keyspace", "doc");
> put("table", "doc");
> }
> }).load();
>
> dataset.count();
>
>
> Thank you!
>
> -Joe
> On 2/3/2022 3:01 PM, C. Scott Andreas wrote:
>
> Hi Joe, it looks like "PT2M" may refer to a timeout value that could be
> set by your Spark job's initialization of the client. I don't see a string
> matching this in the Cassandra codebase itself, but I do see that this is
> parseable as a Duration.
>
> ```
> jshell> java.time.Duration.parse("PT2M").getSeconds()
> $7 ==> 120
> ```
>
> The server-side log you see is likely an indicator of the timeout from the
> server's perspective. You might consider checking lots from the replicas
> for dropped reads, query aborts due to scanning more tombstones than the
> configured max, or other conditions indicating overload/inability to serve
> a response.
>
> If you're running a Spark job, I'd recommend using the DataStax Spark
> Cassandra Connector which distributes your query to executors addressing
> slices of the token range which will land on replica sets, avoiding the
> scatter-gather behavior that can occur if using the Java driver alone.
>
> Cheers,
>
> – Scott
>
>
> On Feb 3, 2022, at 11:42 AM, Joe Obernberger
>   wrote:
>
>
> Hi all - using a Cassandra 4.0.1 and a spark job running against a large
> table (~8 billion rows) and I'm getting this error on the client side:
> Query timed out after PT2M
>
> On the server side I see a lot of messages like:
> DEBUG [Native-Transport-Requests-39] 2022-02-03 14:39:56,647
> ReadCallback.java:119 - Timed out; received 0 of 1 responses
>
> The same code works on another table in the same Cassandra cluster that
> is about 300 million rows and completes in about 2 minutes.  The cluster
> is 13 nodes.
>
> I can't find what PT2M means.  Perhaps the table needs a repair? Other
> ideas?
> Thank you!
>
> -Joe
>
>
>
>
> <http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient>
>  Virus-free.
> www.avg.com
> <http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient>
> <#m_4964341664985366658_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
>


Re: Query timed out after PT2M

2022-02-03 Thread Joe Obernberger
 a Duration.


```
jshell> java.time.Duration.parse("PT2M").getSeconds()
$7 ==> 120
```

The server-side log you see is likely an indicator of the timeout 
from the server's perspective. You might consider checking lots from 
the replicas for dropped reads, query aborts due to scanning more 
tombstones than the configured max, or other conditions indicating 
overload/inability to serve a response.


If you're running a Spark job, I'd recommend using the DataStax 
Spark Cassandra Connector which distributes your query to executors 
addressing slices of the token range which will land on replica 
sets, avoiding the scatter-gather behavior that can occur if using 
the Java driver alone.


Cheers,

– Scott


On Feb 3, 2022, at 11:42 AM, Joe Obernberger 
 wrote:



Hi all - using a Cassandra 4.0.1 and a spark job running against a 
large

table (~8 billion rows) and I'm getting this error on the client side:
Query timed out after PT2M

On the server side I see a lot of messages like:
DEBUG [Native-Transport-Requests-39] 2022-02-03 14:39:56,647
ReadCallback.java:119 - Timed out; received 0 of 1 responses

The same code works on another table in the same Cassandra cluster 
that
is about 300 million rows and completes in about 2 minutes.  The 
cluster

is 13 nodes.

I can't find what PT2M means.  Perhaps the table needs a repair? Other
ideas?
Thank you!

-Joe



<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 
	Virus-free. www.avg.com 
<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: Query timed out after PT2M

2022-02-03 Thread Joe Obernberger

I did find this:
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md

And "spark.cassandra.read.timeoutMS" is set to 12.

Running a test now, and I think that is it.  Thank you Scott.

-Joe

On 2/3/2022 3:19 PM, Joe Obernberger wrote:


Thank you Scott!
I am using the spark cassandra connector.  Code:

SparkSession spark = SparkSession
    .builder()
    .appName("SparkCassandraApp")
    .config("spark.cassandra.connection.host", "chaos")
    .config("spark.cassandra.connection.port", "9042")
    .master("spark://aether.querymasters.com:8181")
    .getOrCreate();

Would I set PT2M in there?  Like .config("pt2m","300") ?
I'm not familiar with jshell, so I'm not sure where you're getting 
that duration from.


Right now, I'm just doing a count:
Dataset dataset = 
spark.read().format("org.apache.spark.sql.cassandra")

    .options(new HashMap() {
    {
    put("keyspace", "doc");
    put("table", "doc");
    }
    }).load();

dataset.count();


Thank you!

-Joe

On 2/3/2022 3:01 PM, C. Scott Andreas wrote:
Hi Joe, it looks like "PT2M" may refer to a timeout value that could 
be set by your Spark job's initialization of the client. I don't see 
a string matching this in the Cassandra codebase itself, but I do see 
that this is parseable as a Duration.


```
jshell> java.time.Duration.parse("PT2M").getSeconds()
$7 ==> 120
```

The server-side log you see is likely an indicator of the timeout 
from the server's perspective. You might consider checking lots from 
the replicas for dropped reads, query aborts due to scanning more 
tombstones than the configured max, or other conditions indicating 
overload/inability to serve a response.


If you're running a Spark job, I'd recommend using the DataStax Spark 
Cassandra Connector which distributes your query to executors 
addressing slices of the token range which will land on replica sets, 
avoiding the scatter-gather behavior that can occur if using the Java 
driver alone.


Cheers,

– Scott


On Feb 3, 2022, at 11:42 AM, Joe Obernberger 
 wrote:



Hi all - using a Cassandra 4.0.1 and a spark job running against a 
large

table (~8 billion rows) and I'm getting this error on the client side:
Query timed out after PT2M

On the server side I see a lot of messages like:
DEBUG [Native-Transport-Requests-39] 2022-02-03 14:39:56,647
ReadCallback.java:119 - Timed out; received 0 of 1 responses

The same code works on another table in the same Cassandra cluster that
is about 300 million rows and completes in about 2 minutes.  The 
cluster

is 13 nodes.

I can't find what PT2M means.  Perhaps the table needs a repair? Other
ideas?
Thank you!

-Joe



<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 
	Virus-free. www.avg.com 
<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: Query timed out after PT2M

2022-02-03 Thread Joe Obernberger

Thank you Scott!
I am using the spark cassandra connector.  Code:

SparkSession spark = SparkSession
    .builder()
    .appName("SparkCassandraApp")
    .config("spark.cassandra.connection.host", "chaos")
    .config("spark.cassandra.connection.port", "9042")
    .master("spark://aether.querymasters.com:8181")
    .getOrCreate();

Would I set PT2M in there?  Like .config("pt2m","300") ?
I'm not familiar with jshell, so I'm not sure where you're getting that 
duration from.


Right now, I'm just doing a count:
Dataset dataset = spark.read().format("org.apache.spark.sql.cassandra")
    .options(new HashMap() {
    {
    put("keyspace", "doc");
    put("table", "doc");
    }
    }).load();

dataset.count();


Thank you!

-Joe

On 2/3/2022 3:01 PM, C. Scott Andreas wrote:
Hi Joe, it looks like "PT2M" may refer to a timeout value that could 
be set by your Spark job's initialization of the client. I don't see a 
string matching this in the Cassandra codebase itself, but I do see 
that this is parseable as a Duration.


```
jshell> java.time.Duration.parse("PT2M").getSeconds()
$7 ==> 120
```

The server-side log you see is likely an indicator of the timeout from 
the server's perspective. You might consider checking lots from the 
replicas for dropped reads, query aborts due to scanning more 
tombstones than the configured max, or other conditions indicating 
overload/inability to serve a response.


If you're running a Spark job, I'd recommend using the DataStax Spark 
Cassandra Connector which distributes your query to executors 
addressing slices of the token range which will land on replica sets, 
avoiding the scatter-gather behavior that can occur if using the Java 
driver alone.


Cheers,

– Scott


On Feb 3, 2022, at 11:42 AM, Joe Obernberger 
 wrote:



Hi all - using a Cassandra 4.0.1 and a spark job running against a large
table (~8 billion rows) and I'm getting this error on the client side:
Query timed out after PT2M

On the server side I see a lot of messages like:
DEBUG [Native-Transport-Requests-39] 2022-02-03 14:39:56,647
ReadCallback.java:119 - Timed out; received 0 of 1 responses

The same code works on another table in the same Cassandra cluster that
is about 300 million rows and completes in about 2 minutes.  The cluster
is 13 nodes.

I can't find what PT2M means.  Perhaps the table needs a repair? Other
ideas?
Thank you!

-Joe



<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 
	Virus-free. www.avg.com 
<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: Query timed out after PT2M

2022-02-03 Thread C. Scott Andreas

Hi Joe, it looks like "PT2M" may refer to a timeout value that could be set by your Spark job's 
initialization of the client. I don't see a string matching this in the Cassandra codebase itself, but I do see 
that this is parseable as a Duration.```jshell> java.time.Duration.parse("PT2M").getSeconds()$7 
==> 120```The server-side log you see is likely an indicator of the timeout from the server's perspective. 
You might consider checking lots from the replicas for dropped reads, query aborts due to scanning more 
tombstones than the configured max, or other conditions indicating overload/inability to serve a response.If 
you're running a Spark job, I'd recommend using the DataStax Spark Cassandra Connector which distributes your 
query to executors addressing slices of the token range which will land on replica sets, avoiding the 
scatter-gather behavior that can occur if using the Java driver alone.Cheers,– ScottOn Feb 3, 2022, at 11:42 
AM, Joe Obernberger  wrote:Hi all - using a Cassandra 4.0.1 and a spark job 
running against a large table (~8 billion rows) and I'm getting this error on the client side:Query timed out 
after PT2MOn the server side I see a lot of messages like:DEBUG [Native-Transport-Requests-39] 2022-02-03 
14:39:56,647 ReadCallback.java:119 - Timed out; received 0 of 1 responsesThe same code works on another table 
in the same Cassandra cluster that is about 300 million rows and completes in about 2 minutes.  The cluster is 
13 nodes.I can't find what PT2M means.  Perhaps the table needs a repair? Other ideas?Thank you!-Joe

Query timed out after PT2M

2022-02-03 Thread Joe Obernberger
Hi all - using a Cassandra 4.0.1 and a spark job running against a large 
table (~8 billion rows) and I'm getting this error on the client side:

Query timed out after PT2M

On the server side I see a lot of messages like:
DEBUG [Native-Transport-Requests-39] 2022-02-03 14:39:56,647 
ReadCallback.java:119 - Timed out; received 0 of 1 responses


The same code works on another table in the same Cassandra cluster that 
is about 300 million rows and completes in about 2 minutes.  The cluster 
is 13 nodes.


I can't find what PT2M means.  Perhaps the table needs a repair? Other 
ideas?

Thank you!

-Joe