[jira] [Commented] (SPARK-27780) Shuffle server & client should be versioned to enable smoother upgrade

2021-09-22 Thread Shyam (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418909#comment-17418909
 ] 

Shyam commented on SPARK-27780:
---

[~comma337] Can you please let me know where is the suffle service information 
in the links you mentioned?

> Shuffle server & client should be versioned to enable smoother upgrade
> --
>
> Key: SPARK-27780
> URL: https://issues.apache.org/jira/browse/SPARK-27780
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Imran Rashid
>Priority: Major
>
> The external shuffle service is often upgraded at a different time than spark 
> itself.  However, this causes problems when the protocol changes between the 
> shuffle service and the spark runtime -- this forces users to upgrade 
> everything simultaneously.
> We should add versioning to the shuffle client & server, so they know what 
> messages the other will support.  This would allow better handling of mixed 
> versions, from better error msgs to allowing some mismatched versions (with 
> reduced capabilities).
> This originally came up in a discussion here: 
> https://github.com/apache/spark/pull/24565#issuecomment-493496466
> There are a few ways we could do the versioning which we still need to 
> discuss:
> 1) Version specified by config.  This allows for mixed versions across the 
> cluster and rolling upgrades.  It also will let a spark 3.0 client talk to a 
> 2.4 shuffle service.  But, may be a nuisance for users to get this right.
> 2) Auto-detection during registration with local shuffle service.  This makes 
> the versioning easy for the end user, and can even handle a 2.4 shuffle 
> service though it does not support the new versioning.  However, it will not 
> handle a rolling upgrade correctly -- if the local shuffle service has been 
> upgraded, but other nodes in the cluster have not, it will get the version 
> wrong.
> 3) Exchange versions per-connection.  When a connection is opened, the server 
> & client could first exchange messages with their versions, so they know how 
> to continue communication after that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18626) Concurrent write to table fails from spark

2020-05-26 Thread Shyam (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-18626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117441#comment-17117441
 ] 

Shyam commented on SPARK-18626:
---

Hi , Even i am getting same issue I am using 2.4.1version , how to slove this 
[https://stackoverflow.com/questions/62036791/while-writing-to-hdfs-path-getting-error-java-io-ioexception-failed-to-rename]

> Concurrent write to table fails from spark
> --
>
> Key: SPARK-18626
> URL: https://issues.apache.org/jira/browse/SPARK-18626
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Thomas Sebastian
>Priority: Major
>  Labels: bug, bulk-closed
>
> When a spark job is submitted twice, to execute concurrently using spark 
> summit, both the jobs are failing, not allowing concurrent write(Append) to 
> Hive tables.
> ERROR InsertIntoHadoopFsRelation: Aborting job.
> java.io.IOException: Failed to rename 
> FileStatus{path=hdfs://nameservice1/user/hive/warehouse/aaa.db/table1/_temporary/0/task_201611210639_0017_m_50/part-r-00050-00e873af-e3ab-4730-881f-e8a1b22077e0.gz.parquet;
>  
> isDirectory=false; length=492; replication=3; blocksize=134217728; 
> modification_time=1479728364366; access_time=1479728364062; 
> owner=name; group=hive; permission=rw-rw-r--; isSymlink=false} to 
> hdfs://nameservice1/user/hive/warehouse/aaa.db/table1t/part-r-00050-00e873af-e3ab-4730-881f-e8a1b22077e0.gz.parquet
>   at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:371)
>   at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:384)
>   at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:326)
>   at 
> parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:46)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25841) Redesign window function rangeBetween API

2020-05-16 Thread Shyam (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17109313#comment-17109313
 ] 

Shyam commented on SPARK-25841:
---

[~rxin] is this fixed in latest version i.e. 2.4.3v ? still this issue 
persisting ?

> Redesign window function rangeBetween API
> -
>
> Key: SPARK-25841
> URL: https://issues.apache.org/jira/browse/SPARK-25841
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>Priority: Major
>
> As I was reviewing the Spark API changes for 2.4, I found that through 
> organic, ad-hoc evolution the current API for window functions in Scala is 
> pretty bad.
>   
>  To illustrate the problem, we have two rangeBetween functions in Window 
> class:
>   
> {code:java}
> class Window {
>  def unboundedPreceding: Long
>  ...
>  def rangeBetween(start: Long, end: Long): WindowSpec
>  def rangeBetween(start: Column, end: Column): WindowSpec
> }{code}
>  
>  The Column version of rangeBetween was added in Spark 2.3 because the 
> previous version (Long) could only support integral values and not time 
> intervals. Now in order to support specifying unboundedPreceding in the 
> rangeBetween(Column, Column) API, we added an unboundedPreceding that returns 
> a Column in functions.scala.
>   
>  There are a few issues I have with the API:
>   
>  1. To the end user, this can be just super confusing. Why are there two 
> unboundedPreceding functions, in different classes, that are named the same 
> but return different types?
>   
>  2. Using Column as the parameter signature implies this can be an actual 
> Column, but in practice rangeBetween can only accept literal values.
>   
>  3. We added the new APIs to support intervals, but they don't actually work, 
> because in the implementation we try to validate the start is less than the 
> end, but calendar interval types are not comparable, and as a result we throw 
> a type mismatch exception at runtime: scala.MatchError: CalendarIntervalType 
> (of class org.apache.spark.sql.types.CalendarIntervalType$)
>   
>  4. In order to make interval work, users need to create an interval using 
> CalendarInterval, which is an internal class that has no documentation and no 
> stable API.
>   
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6761) Approximate quantile

2020-03-30 Thread Shyam (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-6761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17071490#comment-17071490
 ] 

Shyam commented on SPARK-6761:
--

[~mengxr] can you please advice on this bug... 
https://issues.apache.org/jira/browse/SPARK-31310

> Approximate quantile
> 
>
> Key: SPARK-6761
> URL: https://issues.apache.org/jira/browse/SPARK-6761
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 2.0.0
>
>
> See mailing list discussion: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Approximate-rank-based-statistics-median-95-th-percentile-etc-for-Spark-td11414.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31310) percentile_approx function not working as expected

2020-03-30 Thread Shyam (Jira)
Shyam created SPARK-31310:
-

 Summary: percentile_approx function not working as expected
 Key: SPARK-31310
 URL: https://issues.apache.org/jira/browse/SPARK-31310
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.3, 2.4.1, 2.4.0
 Environment: park-sql-2.4.1v with Java 8
Reporter: Shyam
 Fix For: 2.4.3


I'm using spark-sql-2.4.1v with Java 8 and I'm trying to do find quantiles, 
i.e. percentile 0, percentile 25, etc, on the given column data of dataframe.

Column values data set is as below 23456.55,34532.55,23456.55

When I use percentile_approx() function the results are not matching to that of 
Excel percentile_inc() function.

Ex :

for above data set i.e. 23456.55,34532.55,23456.55
percentile_0,percentile_10,percentile_25,percentile_50,percentile_75,percentile_90,percentile_100
 respectively 
using percentile_approx() function
23456.55,23456.55,23456.55,23456.55,23456.55,23456.55,23456.55

Using excel i.e. percentile_inc()
23456.55,23456.55,23456.55,23456.55,28994.5503,32317.3502,34532.55
How to get correct percentiles as excel using percentile_approx() function?

For the details please check it.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31057) approxQuantile function of spark , not taking List as first parameter

2020-03-05 Thread Shyam (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shyam updated SPARK-31057:
--
Shepherd: Sean R. Owen

> approxQuantile function  of spark , not taking List as first parameter
> --
>
> Key: SPARK-31057
> URL: https://issues.apache.org/jira/browse/SPARK-31057
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.1, 2.4.3
> Environment: spark-sql-2.4.3 v and eclipse neon ide.
>  
>Reporter: Shyam
>Priority: Major
>
> 0
> I am using spark-sql-2.4.1v in my project with java8.
> I need to calculate the quantiles on the some of the (calculated) columns 
> (i.e. con_dist_1 , con_dist_2 ) of below given dataframe df.
> {{List calcColmns = Arrays.asList("con_dist_1","con_dist_2")}}
> When I am trying to use first version of approxQuantile i.e. 
> approxQuantile(List, List, double) as below
> Dataset df = //dataset
> {{List> quants = df.stat().approxQuantile(calcColmns , 
> Array(0.0,0.1,0.5),0.0);}}
> *It is giving error :*
> {quote}The method approxQuantile(String, double[], double) in the type 
> DataFrameStatFunctions is not applicable for the arguments (List, List, 
> double)
> {quote}
> so what is wrong here , I am doing it in my eclipseIDE. Why it is not 
> invoking List even though i am passing List ??
> Really appreciate any help on this.
> more details are added here
> [https://stackoverflow.com/questions/60550152/issue-with-approxquantile-of-spark-not-recognizing-liststring]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31057) approxQuantile function of spark , not taking List as first parameter

2020-03-05 Thread Shyam (Jira)
Shyam created SPARK-31057:
-

 Summary: approxQuantile function  of spark , not taking 
List as first parameter
 Key: SPARK-31057
 URL: https://issues.apache.org/jira/browse/SPARK-31057
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.3, 2.4.1
 Environment: spark-sql-2.4.3 v and eclipse neon ide.

 
Reporter: Shyam


0
I am using spark-sql-2.4.1v in my project with java8.

I need to calculate the quantiles on the some of the (calculated) columns (i.e. 
con_dist_1 , con_dist_2 ) of below given dataframe df.
{{List calcColmns = Arrays.asList("con_dist_1","con_dist_2")}}
When I am trying to use first version of approxQuantile i.e. 
approxQuantile(List, List, double) as below

Dataset df = //dataset
{{List> quants = df.stat().approxQuantile(calcColmns , 
Array(0.0,0.1,0.5),0.0);}}
*It is giving error :*
{quote}The method approxQuantile(String, double[], double) in the type 
DataFrameStatFunctions is not applicable for the arguments (List, List, double)
{quote}
so what is wrong here , I am doing it in my eclipseIDE. Why it is not invoking 
List even though i am passing List ??

Really appreciate any help on this.

more details are added here

[https://stackoverflow.com/questions/60550152/issue-with-approxquantile-of-spark-not-recognizing-liststring]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3875) Add TEMP DIRECTORY configuration

2020-02-25 Thread Shyam (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044467#comment-17044467
 ] 

Shyam commented on SPARK-3875:
--

[~srowen] ,  I am still getting this error ,  its suggested this is the fix 
https://issues.apache.org/jira/browse/SPARK-26825 but how to implement it in my 
spark-streaming application?

> Add TEMP DIRECTORY configuration
> 
>
> Key: SPARK-3875
> URL: https://issues.apache.org/jira/browse/SPARK-3875
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Patrick Liu
>Priority: Major
>
> Currently, the Spark uses "java.io.tmpdir" to find the /tmp/ directory.
> Then, the /tmp/ directory is used to 
> 1. Setup the HTTP File Server
> 2. Broadcast directory
> 3. Fetch Dependency files or jars by Executors
> The size of the /tmp/ directory will keep growing. The free space of the 
> system disk will be less.
> I think we could add a configuration "spark.tmp.dir" in conf/spark-env.sh or 
> conf/spark-defaults.conf to set this particular directory. Let's say, set the 
> directory to a data disk.
> If "spark.tmp.dir" is not set, use the default "java.io.tmpdir"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26825) Spark Structure Streaming job failing when submitted in cluster mode

2020-02-24 Thread Shyam (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043593#comment-17043593
 ] 

Shyam commented on SPARK-26825:
---

[~asdaraujo] [~gsomogyi] I am facing the same issue in spark-sql-2.4.1version , 
how to replace/overwrite what is returning by this Utils.createTempDir code ?

Any more clue.

> Spark Structure Streaming job failing when submitted in cluster mode
> 
>
> Key: SPARK-26825
> URL: https://issues.apache.org/jira/browse/SPARK-26825
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Andre Araujo
>Priority: Major
>
> I have a structured streaming job that runs successfully when launched in 
> "client" mode. However, when launched in "cluster" mode it fails with the 
> following weird messages on the error log. Note that the path in the error 
> message is actually a local filesystem path that has been mistakenly prefixed 
> with a {{hdfs://}} scheme.
> {code}
> 19/02/01 12:53:14 ERROR streaming.StreamMetadata: Error writing stream 
> metadata StreamMetadata(68f9fb30-5853-49b4-b192-f1e0483e0d95) to 
> hdfs://ns1/data/yarn/nm/usercache/root/appcache/application_1548823131831_0160/container_1548823131831_0160_02_01/tmp/temporary-3789423a-6ded-4084-aab3-3b6301c34e07/metadataorg.apache.hadoop.security.AccessControlException:
>  Permission denied: user=root, access=WRITE, 
> inode="/":hdfs:supergroup:drwxr-xr-x
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:400)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:256)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:194)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1853)
> {code}
> I dug a little bit into this and here's what I think it's going on:
> # When a new streaming query is created, the {{StreamingQueryManager}} 
> determines the checkpoint location 
> [here|https://github.com/apache/spark/blob/d811369ce23186cbb3208ad665e15408e13fea87/sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala#L216].
>  If neither the user nor the Spark conf specify a checkpoint location, the 
> location is returned by a call to {{Utils.createTempDir(namePrefix = 
> s"temporary").getCanonicalPath}}. 
>Here, I see two issues:
> #* The canonical path returned by {{Utils.createTempDir}} does *not* have a 
> scheme ({{hdfs://}} or {{file://}}), so, it's ambiguous as to what type of 
> file system the path belongs to.
> #* Also note that the path returned by the {{Utils.createTempDir}} call is a 
> local path, not a HDFS path, as the paths returned by the other two 
> conditions. I executed {{Utils.createTempDir}} in a test job, both in cluster 
> and client modes, and the results are these:
> {code}
> *Client mode:*
> java.io.tmpdir=/tmp
> createTempDir(namePrefix = s"temporary") => 
> /tmp/temporary-c51f1466-fd50-40c7-b136-1f2f06672e25
> *Cluster mode:*
> java.io.tmpdir=/yarn/nm/usercache/root/appcache/application_154906473_0029/container_154906473_0029_01_01/tmp/
> createTempDir(namePrefix = s"temporary") => 
> /yarn/nm/usercache/root/appcache/application_154906473_0029/container_154906473_0029_01_01/tmp/temporary-47c13b28-14bd-4d1b-8acc-3e445948415e
> {code}
> # This temporary checkpoint location is then [passed to the 
> constructor|https://github.com/apache/spark/blob/d811369ce23186cbb3208ad665e15408e13fea87/sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala#L276]
>  of the {{MicroBatchExecution}} instance
> # This is the point where [{{resolvedCheckpointRoot}} is 
> calculated|https://github.com/apache/spark/blob/755f9c20761e3db900c6c2b202cd3d2c5bbfb7c0/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L89].
>  Here, it's where things start to break: since the path returned by 
> {{Utils.createTempDir}} doesn't have a scheme, and since HDFS is the default 
> filesystem, the code resolves the path as being a HDFS path, rather than a 
> local one, as shown below:
> {code}
> scala> import org.apache.hadoop.fs.Path
> import org.apache.hadoop.fs.Path
> scala> // value returned by the Utils.createTempDir method
> scala> val checkpointRoot = 
> "/yarn/nm/usercache/root/appcache/application_154906473_0029/container_154906473_0029_01_01/tmp/temporary-47c13b28-14bd-4d1b-8acc-3e445948415e"
> checkpointRoot: String = 
> /yarn/nm/usercache/root/appcache/application_154906473_0029/container_154906473_0029_01_01/tmp/temporary-47c13b28-14bd-4d1b-8acc-3e445948415e
> sca

[jira] [Commented] (SPARK-23829) spark-sql-kafka source in spark 2.3 causes reading stream failure frequently

2020-02-04 Thread Shyam (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-23829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029696#comment-17029696
 ] 

Shyam commented on SPARK-23829:
---

[~gsomogyi]  me new to kafka and spark , what do you mean spark-vanila version 
here ?

> spark-sql-kafka source in spark 2.3 causes reading stream failure frequently
> 
>
> Key: SPARK-23829
> URL: https://issues.apache.org/jira/browse/SPARK-23829
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Norman Bai
>Priority: Major
> Fix For: 2.4.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In spark 2.3 , it provides a source "spark-sql-kafka-0-10_2.11".
>  
> When I wanted to read from my kafka-0.10.2.1 cluster, it throws out an error 
> "*java.util.concurrent.TimeoutException: Cannot fetch record  for offset 
> in 12000 milliseconds*"  frequently , and the job thus failed.
>  
> I searched on google & stackoverflow for a while, and found many other people 
> who got this excption too, and nobody gave an answer.
>  
> I debuged the source code, found nothing, but I guess it's because the 
> dependency spark-sql-kafka-0-10_2.11 is using.
>  
> {code:java}
> 
>  org.apache.spark
>  spark-sql-kafka-0-10_2.11
>  2.3.0
>  
>  
>  kafka-clients
>  org.apache.kafka
>  
>  
> 
> 
>  org.apache.kafka
>  kafka-clients
>  0.10.2.1
> {code}
> I excluded it from maven ,and added another version , rerun the code , and 
> now it works.
>  
> I guess something is wrong on kafka-clients0.10.0.1 working with 
> kafka0.10.2.1, or more kafka versions. 
>  
> Hope for an explanation.
> Here is the error stack.
> {code:java}
> [ERROR] 2018-03-30 13:34:11,404 [stream execution thread for [id = 
> 83076cf1-4bf0-4c82-a0b3-23d8432f5964, runId = 
> b3e18aa6-358f-43f6-a077-e34db0822df6]] 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution logError - Query 
> [id = 83076cf1-4bf0-4c82-a0b3-23d8432f5964, runId = 
> b3e18aa6-358f-43f6-a077-e34db0822df6] terminated with error
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in 
> stage 0.0 failed 1 times, most recent failure: Lost task 6.0 in stage 0.0 
> (TID 6, localhost, executor driver): java.util.concurrent.TimeoutException: 
> Cannot fetch record for offset 6481521 in 12 milliseconds
> at 
> org.apache.spark.sql.kafka010.CachedKafkaConsumer.org$apache$spark$sql$kafka010$CachedKafkaConsumer$$fetchData(CachedKafkaConsumer.scala:230)
> at 
> org.apache.spark.sql.kafka010.CachedKafkaConsumer$$anonfun$get$1.apply(CachedKafkaConsumer.scala:122)
> at 
> org.apache.spark.sql.kafka010.CachedKafkaConsumer$$anonfun$get$1.apply(CachedKafkaConsumer.scala:106)
> at 
> org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77)
> at 
> org.apache.spark.sql.kafka010.CachedKafkaConsumer.runUninterruptiblyIfPossible(CachedKafkaConsumer.scala:68)
> at 
> org.apache.spark.sql.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:106)
> at 
> org.apache.spark.sql.kafka010.KafkaSourceRDD$$anon$1.getNext(KafkaSourceRDD.scala:157)
> at 
> org.apache.spark.sql.kafka010.KafkaSourceRDD$$anon$1.getNext(KafkaSourceRDD.scala:148)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
> at 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:107)
> at 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:105)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun

[jira] [Commented] (SPARK-25466) Documentation does not specify how to set Kafka consumer cache capacity for SS

2019-12-11 Thread Shyam (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16993557#comment-16993557
 ] 

Shyam commented on SPARK-25466:
---

[~gsomogyi] I raised a ticket https://issues.apache.org/jira/browse/SPARK-30222 
please let me know if anything specific are you looking for , to suggest a fix.

> Documentation does not specify how to set Kafka consumer cache capacity for SS
> --
>
> Key: SPARK-25466
> URL: https://issues.apache.org/jira/browse/SPARK-25466
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Patrick McGloin
>Priority: Minor
>
> When hitting this warning with SS:
> 19-09-2018 12:05:27 WARN  CachedKafkaConsumer:66 - KafkaConsumer cache 
> hitting max capacity of 64, removing consumer for 
> CacheKey(spark-kafka-source-e06c9676-32c6-49c4-80a9-2d0ac4590609--694285871-executor,MyKafkaTopic-30)
> If you Google you get to this page:
> https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html
> Which is for Spark Streaming and says to use this config item to adjust the 
> capacity: "spark.streaming.kafka.consumer.cache.maxCapacity".
> This is a bit confusing as SS uses a different config item: 
> "spark.sql.kafkaConsumerCache.capacity"
> Perhaps the SS Kafka documentation should talk about the consumer cache 
> capacity?  Perhaps here?
> https://spark.apache.org/docs/2.2.0/structured-streaming-kafka-integration.html
> Or perhaps the warning message should reference the config item.  E.g
> 19-09-2018 12:05:27 WARN  CachedKafkaConsumer:66 - KafkaConsumer cache 
> hitting max capacity of 64, removing consumer for 
> CacheKey(spark-kafka-source-e06c9676-32c6-49c4-80a9-2d0ac4590609--694285871-executor,MyKafkaTopic-30).
>   *The cache size can be adjusted with the setting 
> "spark.sql.kafkaConsumerCache.capacity".*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30222) Still getting KafkaConsumer cache hitting max capacity of 64, removing consumer for CacheKe

2019-12-11 Thread Shyam (Jira)
Shyam created SPARK-30222:
-

 Summary: Still getting KafkaConsumer cache hitting max capacity of 
64, removing consumer for CacheKe
 Key: SPARK-30222
 URL: https://issues.apache.org/jira/browse/SPARK-30222
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 2.4.1
 Environment: {{Below are the logs.}}

2019-12-11 08:33:31,504 [Executor task launch worker for task 1050] WARN 
org.apache.spark.sql.kafka010.KafkaDataConsumer - KafkaConsumer cache hitting 
max capacity of 64, removing consumer for 
CacheKey(spark-kafka-source-93ee3689-79f9-42e8-b1ee-e856570205ae-1923743483-executor,COMPANY_TRANSACTIONS_INBOUND-21)
2019-12-11 08:33:32,493 [Executor task launch worker for task 1051] WARN 
org.apache.spark.sql.kafka010.KafkaDataConsumer - KafkaConsumer cache hitting 
max capacity of 64, removing consumer for 
CacheKey(spark-kafka-source-93ee3689-79f9-42e8-b1ee-e856570205ae-1923743483-executor,COMPANY_TRANSACTIONS_INBOUND-5)
2019-12-11 08:33:32,570 [Executor task launch worker for task 1052] WARN 
org.apache.spark.sql.kafka010.KafkaDataConsumer - KafkaConsumer cache hitting 
max capacity of 64, removing consumer for 
CacheKey(spark-kafka-source-93ee3689-79f9-42e8-b1ee-e856570205ae-1923743483-executor,COMPANY_TRANSACTIONS_INBOUND-9)
2019-12-11 08:33:33,441 [Executor task launch worker for task 1053] WARN 
org.apache.spark.sql.kafka010.KafkaDataConsumer - KafkaConsumer cache hitting 
max capacity of 64, removing consumer for 
CacheKey(spark-kafka-source-93ee3689-79f9-42e8-b1ee-e856570205ae-1923743483-executor,COMPANY_TRANSACTIONS_INBOUND-17)
2019-12-11 08:33:33,619 [Executor task launch worker for task 1054] WARN 
org.apache.spark.sql.kafka010.KafkaDataConsumer - KafkaConsumer cache hitting 
max capacity of 64, removing consumer for 
CacheKey(spark-kafka-source-93ee3689-79f9-42e8-b1ee-e856570205ae-1923743483-executor,COMPANY_TRANSACTIONS_INBOUND-2)
2019-12-11 08:33:34,474 [Executor task launch worker for task 1055] WARN 
org.apache.spark.sql.kafka010.KafkaDataConsumer - KafkaConsumer cache hitting 
max capacity of 64, removing consumer for 
CacheKey(spark-kafka-source-93ee3689-79f9-42e8-b1ee-e856570205ae-1923743483-executor,COMPANY_TRANSACTIONS_INBOUND-10)
2019-12-11 08:33:35,006 [Executor task launch worker for task 1056] WARN 
org.apache.spark.sql.kafka010.KafkaDataConsumer - KafkaConsumer cache hitting 
max capacity of 64, removing consumer for 
CacheKey(spark-kafka-source-93ee3689-79f9-42e8-b1ee-e856570205ae-1923743483-executor,COMPANY_TRANSACTIONS_INBOUND-6)
2019-12-11 08:33:36,326 [Executor task launch worker for task 1057] WARN 
org.apache.spark.sql.kafka010.KafkaDataConsumer - KafkaConsumer cache hitting 
max capacity of 64, removing consumer for 
CacheKey(spark-kafka-source-93ee3689-79f9-42e8-b1ee-e856570205ae-1923743483-executor,COMPANY_TRANSACTIONS_INBOUND-14)
2019-12-11 08:33:36,634 [Executor task launch worker for task 1058] WARN 
org.apache.spark.sql.kafka010.KafkaDataConsumer - KafkaConsumer cache hitting 
max capacity of 64, removing consumer for 
CacheKey(spark-kafka-source-93ee3689-79f9-42e8-b1ee-e856570205ae-1923743483-executor,COMPANY_TRANSACTIONS_INBOUND-0)
2019-12-11 08:33:37,496 [Executor task launch worker for task 1059] WARN 
org.apache.spark.sql.kafka010.KafkaDataConsumer - KafkaConsumer cache hitting 
max capacity of 64, removing consumer for 
CacheKey(spark-kafka-source-93ee3689-79f9-42e8-b1ee-e856570205ae-1923743483-executor,COMPANY_TRANSACTIONS_INBOUND-19)
2019-12-11 08:33:39,183 [stream execution thread for [id = 
b3aec196-e4f2-4ef9-973b-d5685eba917e, runId = 
5c35a63a-16ad-4899-b732-1019397770bd]] WARN 
org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor - Current batch 
is falling behind. The trigger interval is 15000 milliseconds, but spent 63438 
milliseconds
Reporter: Shyam
 Fix For: 2.4.1


Me using spark-sql-2.4.1 version with Kafka 0.10 v.

While I try to consume data by consumer. it gives error below even after 
setting 

.option("spark.sql.kafkaConsumerCache.capacity",128)

 

{{Dataset df = sparkSession}}

{{       .readStream()}}

{{       .format("kafka")}}

{{       .option("kafka.bootstrap.servers", SERVERS)}}

{{       .option("subscribe", TOPIC) }}{{}}

{{       .option("spark.sql.kafkaConsumerCache.capacity",128)   }}

{{ }}

{{       .load();}}

{{}}

{{}}

{{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25466) Documentation does not specify how to set Kafka consumer cache capacity for SS

2019-11-06 Thread Shyam (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968435#comment-16968435
 ] 

Shyam commented on SPARK-25466:
---

[~gsomogyi] I am still facing the same issue how to fix it ? I tried these 
things mentioned here in SOF  
[https://stackoverflow.com/questions/58456939/how-to-set-spark-consumer-cache-to-fix-kafkaconsumer-cache-hitting-max-capaci]
 

> Documentation does not specify how to set Kafka consumer cache capacity for SS
> --
>
> Key: SPARK-25466
> URL: https://issues.apache.org/jira/browse/SPARK-25466
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Patrick McGloin
>Priority: Minor
>
> When hitting this warning with SS:
> 19-09-2018 12:05:27 WARN  CachedKafkaConsumer:66 - KafkaConsumer cache 
> hitting max capacity of 64, removing consumer for 
> CacheKey(spark-kafka-source-e06c9676-32c6-49c4-80a9-2d0ac4590609--694285871-executor,MyKafkaTopic-30)
> If you Google you get to this page:
> https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html
> Which is for Spark Streaming and says to use this config item to adjust the 
> capacity: "spark.streaming.kafka.consumer.cache.maxCapacity".
> This is a bit confusing as SS uses a different config item: 
> "spark.sql.kafkaConsumerCache.capacity"
> Perhaps the SS Kafka documentation should talk about the consumer cache 
> capacity?  Perhaps here?
> https://spark.apache.org/docs/2.2.0/structured-streaming-kafka-integration.html
> Or perhaps the warning message should reference the config item.  E.g
> 19-09-2018 12:05:27 WARN  CachedKafkaConsumer:66 - KafkaConsumer cache 
> hitting max capacity of 64, removing consumer for 
> CacheKey(spark-kafka-source-e06c9676-32c6-49c4-80a9-2d0ac4590609--694285871-executor,MyKafkaTopic-30).
>   *The cache size can be adjusted with the setting 
> "spark.sql.kafkaConsumerCache.capacity".*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29710) Seeing offsets not resetting even when reset policy is configured explicitly

2019-11-01 Thread Shyam (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964696#comment-16964696
 ] 

Shyam commented on SPARK-29710:
---

@Gabor Somogyi Can you please help me , what is wrong here 

> Seeing offsets not resetting even when reset policy is configured explicitly
> 
>
> Key: SPARK-29710
> URL: https://issues.apache.org/jira/browse/SPARK-29710
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.1
> Environment: Window10 , eclipse neos
>Reporter: Shyam
>Priority: Major
>
>  
>  even after setting *"auto.offset.reset" to "latest"*  I am getting below 
> error
>  
> org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of 
> range with no configured reset policy for partitions: 
> \{COMPANY_TRANSACTIONS_INBOUND-16=168}org.apache.kafka.clients.consumer.OffsetOutOfRangeException:
>  Offsets out of range with no configured reset policy for partitions: 
> \{COMPANY_TRANSACTIONS_INBOUND-16=168} at 
> org.apache.kafka.clients.consumer.internals.Fetcher.throwIfOffsetOutOfRange(Fetcher.java:348)
>  at 
> org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:396)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:999)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937) 
> at 
> org.apache.spark.sql.kafka010.InternalKafkaConsumer.fetchData(KafkaDataConsumer.scala:470)
>  at 
> org.apache.spark.sql.kafka010.InternalKafkaConsumer.org$apache$spark$sql$kafka010$InternalKafkaConsumer$$fetchRecord(KafkaDataConsumer.scala:361)
>  at 
> org.apache.spark.sql.kafka010.InternalKafkaConsumer$$anonfun$get$1.apply(KafkaDataConsumer.scala:251)
>  at 
> org.apache.spark.sql.kafka010.InternalKafkaConsumer$$anonfun$get$1.apply(KafkaDataConsumer.scala:234)
>  at 
> org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77)
>  at 
> org.apache.spark.sql.kafka010.InternalKafkaConsumer.runUninterruptiblyIfPossible(KafkaDataConsumer.scala:209)
>  at 
> org.apache.spark.sql.kafka010.InternalKafkaConsumer.get(KafkaDataConsumer.scala:234)
>  
> [https://stackoverflow.com/questions/58653885/even-after-setting-auto-offset-reset-to-latest-getting-error-offsetoutofrang]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29710) Seeing offsets not resetting even when reset policy is configured explicitly

2019-10-31 Thread Shyam (Jira)
Shyam created SPARK-29710:
-

 Summary: Seeing offsets not resetting even when reset policy is 
configured explicitly
 Key: SPARK-29710
 URL: https://issues.apache.org/jira/browse/SPARK-29710
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 2.4.1
 Environment: Window10 , eclipse neos
Reporter: Shyam


 

 even after setting *"auto.offset.reset" to "latest"*  I am getting below error

 

org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of 
range with no configured reset policy for partitions: 
\{COMPANY_TRANSACTIONS_INBOUND-16=168}org.apache.kafka.clients.consumer.OffsetOutOfRangeException:
 Offsets out of range with no configured reset policy for partitions: 
\{COMPANY_TRANSACTIONS_INBOUND-16=168} at 
org.apache.kafka.clients.consumer.internals.Fetcher.throwIfOffsetOutOfRange(Fetcher.java:348)
 at 
org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:396)
 at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:999)
 at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937) at 
org.apache.spark.sql.kafka010.InternalKafkaConsumer.fetchData(KafkaDataConsumer.scala:470)
 at 
org.apache.spark.sql.kafka010.InternalKafkaConsumer.org$apache$spark$sql$kafka010$InternalKafkaConsumer$$fetchRecord(KafkaDataConsumer.scala:361)
 at 
org.apache.spark.sql.kafka010.InternalKafkaConsumer$$anonfun$get$1.apply(KafkaDataConsumer.scala:251)
 at 
org.apache.spark.sql.kafka010.InternalKafkaConsumer$$anonfun$get$1.apply(KafkaDataConsumer.scala:234)
 at 
org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77)
 at 
org.apache.spark.sql.kafka010.InternalKafkaConsumer.runUninterruptiblyIfPossible(KafkaDataConsumer.scala:209)
 at 
org.apache.spark.sql.kafka010.InternalKafkaConsumer.get(KafkaDataConsumer.scala:234)

 

[https://stackoverflow.com/questions/58653885/even-after-setting-auto-offset-reset-to-latest-getting-error-offsetoutofrang]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22000) org.codehaus.commons.compiler.CompileException: toString method is not declared

2019-10-28 Thread Shyam (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-22000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961159#comment-16961159
 ] 

Shyam commented on SPARK-22000:
---

[~srowen] I am using spark.2.4.1 version getting same error ... for more 
details please check this , 
[https://stackoverflow.com/questions/58593215/inserting-into-cassandra-table-from-spark-dataframe-results-in-org-codehaus-comm]
     how to fix this ?

> org.codehaus.commons.compiler.CompileException: toString method is not 
> declared
> ---
>
> Key: SPARK-22000
> URL: https://issues.apache.org/jira/browse/SPARK-22000
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: taiho choi
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: testcase.zip
>
>
> the error message say that toString is not declared on "value13" which is 
> "long" type in generated code.
> i think value13 should be Long type.
> ==error message
> Caused by: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 70, Column 32: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 70, Column 32: A method named "toString" is not declared in any enclosing 
> class nor any supertype, nor through a static import
> /* 033 */   private void apply1_2(InternalRow i) {
> /* 034 */
> /* 035 */
> /* 036 */ boolean isNull11 = i.isNullAt(1);
> /* 037 */ UTF8String value11 = isNull11 ? null : (i.getUTF8String(1));
> /* 038 */ boolean isNull10 = true;
> /* 039 */ java.lang.String value10 = null;
> /* 040 */ if (!isNull11) {
> /* 041 */
> /* 042 */   isNull10 = false;
> /* 043 */   if (!isNull10) {
> /* 044 */
> /* 045 */ Object funcResult4 = null;
> /* 046 */ funcResult4 = value11.toString();
> /* 047 */
> /* 048 */ if (funcResult4 != null) {
> /* 049 */   value10 = (java.lang.String) funcResult4;
> /* 050 */ } else {
> /* 051 */   isNull10 = true;
> /* 052 */ }
> /* 053 */
> /* 054 */
> /* 055 */   }
> /* 056 */ }
> /* 057 */ javaBean.setApp(value10);
> /* 058 */
> /* 059 */
> /* 060 */ boolean isNull13 = i.isNullAt(12);
> /* 061 */ long value13 = isNull13 ? -1L : (i.getLong(12));
> /* 062 */ boolean isNull12 = true;
> /* 063 */ java.lang.String value12 = null;
> /* 064 */ if (!isNull13) {
> /* 065 */
> /* 066 */   isNull12 = false;
> /* 067 */   if (!isNull12) {
> /* 068 */
> /* 069 */ Object funcResult5 = null;
> /* 070 */ funcResult5 = value13.toString();
> /* 071 */
> /* 072 */ if (funcResult5 != null) {
> /* 073 */   value12 = (java.lang.String) funcResult5;
> /* 074 */ } else {
> /* 075 */   isNull12 = true;
> /* 076 */ }
> /* 077 */
> /* 078 */
> /* 079 */   }
> /* 080 */ }
> /* 081 */ javaBean.setReasonCode(value12);
> /* 082 */
> /* 083 */   }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10420) Implementing Reactive Streams based Spark Streaming Receiver

2016-02-22 Thread Shyam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157001#comment-15157001
 ] 

Shyam commented on SPARK-10420:
---

Hi all, is there any activity around this ticket?

> Implementing Reactive Streams based Spark Streaming Receiver
> 
>
> Key: SPARK-10420
> URL: https://issues.apache.org/jira/browse/SPARK-10420
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Nilanjan Raychaudhuri
>Priority: Minor
>
> Hello TD,
> This is probably the last bit of the back-pressure story, implementing 
> ReactiveStreams based Spark streaming receivers. After discussing about this 
> with my Typesafe team we came up with the following design document
> https://docs.google.com/document/d/1lGQKXfNznd5SPuQigvCdLsudl-gcvWKuHWr0Bpn3y30/edit?usp=sharing
> Could you please take a look at this when you get a chance?
> Thanks
> Nilanjan



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4354) 14/11/12 09:39:00 WARN TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5, HYD-RNDNW-VFRCO-RCORE2): java.lang.NoClassDefFoundError: Could not initialize class org.xerial

2014-11-14 Thread Shyam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213375#comment-14213375
 ] 

Shyam commented on SPARK-4354:
--

Will somebody please enlighten me on the mentioned exception and how to get rid 
of it? Thank you.

> 14/11/12 09:39:00 WARN TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5, 
> HYD-RNDNW-VFRCO-RCORE2): java.lang.NoClassDefFoundError: Could not initialize 
> class org.xerial.snappy.Snappy
> --
>
> Key: SPARK-4354
> URL: https://issues.apache.org/jira/browse/SPARK-4354
> Project: Spark
>  Issue Type: Question
>  Components: Examples
>Affects Versions: 1.1.0
> Environment: Linux
>Reporter: Shyam
>  Labels: newbie
> Attachments: client-exception.txt
>
>
> Prebuilt Spark for Hadoop 2.4 installed in 4 redhat linux machines 
> Standalone cluster mode.
> Machine 1(Master)
> Machine 2(Worker node 1)
> Machine 3(Worker node 2)
> Machine 4(Client for executing spark examples)
> I ran below mentioned command in Machine 4 then got exception mentioned in 
> the summary of this issue.
> sh spark-submit  --class org.apache.spark.examples.SparkPi --jars 
> /FS/lib/spark-assembly-1.1.0-hadoop2.4.0.jar  --master spark://MasterIP:7077 
> --deploy-mode client /FS/lib/spark-examples-1.1.0-hadoop2.4.0.jar 10
> java.lang.NoClassDefFoundError: Could not initialize class 
> org.xerial.snappy.Snappy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4354) 14/11/12 09:39:00 WARN TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5, HYD-RNDNW-VFRCO-RCORE2): java.lang.NoClassDefFoundError: Could not initialize class org.xerial.s

2014-11-11 Thread Shyam (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shyam updated SPARK-4354:
-
Attachment: client-exception.txt

The process of execution and mentioned exception can be found in the text file 
attached.

> 14/11/12 09:39:00 WARN TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5, 
> HYD-RNDNW-VFRCO-RCORE2): java.lang.NoClassDefFoundError: Could not initialize 
> class org.xerial.snappy.Snappy
> --
>
> Key: SPARK-4354
> URL: https://issues.apache.org/jira/browse/SPARK-4354
> Project: Spark
>  Issue Type: Question
>  Components: Examples
>Affects Versions: 1.1.0
> Environment: Linux
>Reporter: Shyam
>  Labels: newbie
> Attachments: client-exception.txt
>
>
> Prebuilt Spark for Hadoop 2.4 installed in 4 redhat linux machines 
> Standalone cluster mode.
> Machine 1(Master)
> Machine 2(Worker node 1)
> Machine 3(Worker node 2)
> Machine 4(Client for executing spark examples)
> I ran below mentioned command in Machine 4 then got exception mentioned in 
> the summary of this issue.
> sh spark-submit  --class org.apache.spark.examples.SparkPi --jars 
> /FS/lib/spark-assembly-1.1.0-hadoop2.4.0.jar  --master spark://MasterIP:7077 
> --deploy-mode client /FS/lib/spark-examples-1.1.0-hadoop2.4.0.jar 10
> java.lang.NoClassDefFoundError: Could not initialize class 
> org.xerial.snappy.Snappy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4354) 14/11/12 09:39:00 WARN TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5, HYD-RNDNW-VFRCO-RCORE2): java.lang.NoClassDefFoundError: Could not initialize class org.xerial.s

2014-11-11 Thread Shyam (JIRA)
Shyam created SPARK-4354:


 Summary: 14/11/12 09:39:00 WARN TaskSetManager: Lost task 5.0 in 
stage 0.0 (TID 5, HYD-RNDNW-VFRCO-RCORE2): java.lang.NoClassDefFoundError: 
Could not initialize class org.xerial.snappy.Snappy
 Key: SPARK-4354
 URL: https://issues.apache.org/jira/browse/SPARK-4354
 Project: Spark
  Issue Type: Question
  Components: Examples
Affects Versions: 1.1.0
 Environment: Linux
Reporter: Shyam


Prebuilt Spark for Hadoop 2.4 installed in 4 redhat linux machines 

Standalone cluster mode.

Machine 1(Master)
Machine 2(Worker node 1)
Machine 3(Worker node 2)
Machine 4(Client for executing spark examples)

I ran below mentioned command in Machine 4 then got exception mentioned in the 
summary of this issue.

sh spark-submit  --class org.apache.spark.examples.SparkPi --jars 
/FS/lib/spark-assembly-1.1.0-hadoop2.4.0.jar  --master spark://MasterIP:7077 
--deploy-mode client /FS/lib/spark-examples-1.1.0-hadoop2.4.0.jar 10


java.lang.NoClassDefFoundError: Could not initialize class 
org.xerial.snappy.Snappy






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org