[jira] [Commented] (SPARK-27780) Shuffle server & client should be versioned to enable smoother upgrade
[ https://issues.apache.org/jira/browse/SPARK-27780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418909#comment-17418909 ] Shyam commented on SPARK-27780: --- [~comma337] Can you please let me know where is the suffle service information in the links you mentioned? > Shuffle server & client should be versioned to enable smoother upgrade > -- > > Key: SPARK-27780 > URL: https://issues.apache.org/jira/browse/SPARK-27780 > Project: Spark > Issue Type: New Feature > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Imran Rashid >Priority: Major > > The external shuffle service is often upgraded at a different time than spark > itself. However, this causes problems when the protocol changes between the > shuffle service and the spark runtime -- this forces users to upgrade > everything simultaneously. > We should add versioning to the shuffle client & server, so they know what > messages the other will support. This would allow better handling of mixed > versions, from better error msgs to allowing some mismatched versions (with > reduced capabilities). > This originally came up in a discussion here: > https://github.com/apache/spark/pull/24565#issuecomment-493496466 > There are a few ways we could do the versioning which we still need to > discuss: > 1) Version specified by config. This allows for mixed versions across the > cluster and rolling upgrades. It also will let a spark 3.0 client talk to a > 2.4 shuffle service. But, may be a nuisance for users to get this right. > 2) Auto-detection during registration with local shuffle service. This makes > the versioning easy for the end user, and can even handle a 2.4 shuffle > service though it does not support the new versioning. However, it will not > handle a rolling upgrade correctly -- if the local shuffle service has been > upgraded, but other nodes in the cluster have not, it will get the version > wrong. > 3) Exchange versions per-connection. When a connection is opened, the server > & client could first exchange messages with their versions, so they know how > to continue communication after that. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18626) Concurrent write to table fails from spark
[ https://issues.apache.org/jira/browse/SPARK-18626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117441#comment-17117441 ] Shyam commented on SPARK-18626: --- Hi , Even i am getting same issue I am using 2.4.1version , how to slove this [https://stackoverflow.com/questions/62036791/while-writing-to-hdfs-path-getting-error-java-io-ioexception-failed-to-rename] > Concurrent write to table fails from spark > -- > > Key: SPARK-18626 > URL: https://issues.apache.org/jira/browse/SPARK-18626 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Thomas Sebastian >Priority: Major > Labels: bug, bulk-closed > > When a spark job is submitted twice, to execute concurrently using spark > summit, both the jobs are failing, not allowing concurrent write(Append) to > Hive tables. > ERROR InsertIntoHadoopFsRelation: Aborting job. > java.io.IOException: Failed to rename > FileStatus{path=hdfs://nameservice1/user/hive/warehouse/aaa.db/table1/_temporary/0/task_201611210639_0017_m_50/part-r-00050-00e873af-e3ab-4730-881f-e8a1b22077e0.gz.parquet; > > isDirectory=false; length=492; replication=3; blocksize=134217728; > modification_time=1479728364366; access_time=1479728364062; > owner=name; group=hive; permission=rw-rw-r--; isSymlink=false} to > hdfs://nameservice1/user/hive/warehouse/aaa.db/table1t/part-r-00050-00e873af-e3ab-4730-881f-e8a1b22077e0.gz.parquet > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:371) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:384) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:326) > at > parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:46) > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25841) Redesign window function rangeBetween API
[ https://issues.apache.org/jira/browse/SPARK-25841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17109313#comment-17109313 ] Shyam commented on SPARK-25841: --- [~rxin] is this fixed in latest version i.e. 2.4.3v ? still this issue persisting ? > Redesign window function rangeBetween API > - > > Key: SPARK-25841 > URL: https://issues.apache.org/jira/browse/SPARK-25841 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 2.3.2, 2.4.0 >Reporter: Reynold Xin >Assignee: Reynold Xin >Priority: Major > > As I was reviewing the Spark API changes for 2.4, I found that through > organic, ad-hoc evolution the current API for window functions in Scala is > pretty bad. > > To illustrate the problem, we have two rangeBetween functions in Window > class: > > {code:java} > class Window { > def unboundedPreceding: Long > ... > def rangeBetween(start: Long, end: Long): WindowSpec > def rangeBetween(start: Column, end: Column): WindowSpec > }{code} > > The Column version of rangeBetween was added in Spark 2.3 because the > previous version (Long) could only support integral values and not time > intervals. Now in order to support specifying unboundedPreceding in the > rangeBetween(Column, Column) API, we added an unboundedPreceding that returns > a Column in functions.scala. > > There are a few issues I have with the API: > > 1. To the end user, this can be just super confusing. Why are there two > unboundedPreceding functions, in different classes, that are named the same > but return different types? > > 2. Using Column as the parameter signature implies this can be an actual > Column, but in practice rangeBetween can only accept literal values. > > 3. We added the new APIs to support intervals, but they don't actually work, > because in the implementation we try to validate the start is less than the > end, but calendar interval types are not comparable, and as a result we throw > a type mismatch exception at runtime: scala.MatchError: CalendarIntervalType > (of class org.apache.spark.sql.types.CalendarIntervalType$) > > 4. In order to make interval work, users need to create an interval using > CalendarInterval, which is an internal class that has no documentation and no > stable API. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6761) Approximate quantile
[ https://issues.apache.org/jira/browse/SPARK-6761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17071490#comment-17071490 ] Shyam commented on SPARK-6761: -- [~mengxr] can you please advice on this bug... https://issues.apache.org/jira/browse/SPARK-31310 > Approximate quantile > > > Key: SPARK-6761 > URL: https://issues.apache.org/jira/browse/SPARK-6761 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: L. C. Hsieh >Priority: Major > Fix For: 2.0.0 > > > See mailing list discussion: > http://apache-spark-developers-list.1001551.n3.nabble.com/Approximate-rank-based-statistics-median-95-th-percentile-etc-for-Spark-td11414.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31310) percentile_approx function not working as expected
Shyam created SPARK-31310: - Summary: percentile_approx function not working as expected Key: SPARK-31310 URL: https://issues.apache.org/jira/browse/SPARK-31310 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.3, 2.4.1, 2.4.0 Environment: park-sql-2.4.1v with Java 8 Reporter: Shyam Fix For: 2.4.3 I'm using spark-sql-2.4.1v with Java 8 and I'm trying to do find quantiles, i.e. percentile 0, percentile 25, etc, on the given column data of dataframe. Column values data set is as below 23456.55,34532.55,23456.55 When I use percentile_approx() function the results are not matching to that of Excel percentile_inc() function. Ex : for above data set i.e. 23456.55,34532.55,23456.55 percentile_0,percentile_10,percentile_25,percentile_50,percentile_75,percentile_90,percentile_100 respectively using percentile_approx() function 23456.55,23456.55,23456.55,23456.55,23456.55,23456.55,23456.55 Using excel i.e. percentile_inc() 23456.55,23456.55,23456.55,23456.55,28994.5503,32317.3502,34532.55 How to get correct percentiles as excel using percentile_approx() function? For the details please check it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31057) approxQuantile function of spark , not taking List as first parameter
[ https://issues.apache.org/jira/browse/SPARK-31057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shyam updated SPARK-31057: -- Shepherd: Sean R. Owen > approxQuantile function of spark , not taking List as first parameter > -- > > Key: SPARK-31057 > URL: https://issues.apache.org/jira/browse/SPARK-31057 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.1, 2.4.3 > Environment: spark-sql-2.4.3 v and eclipse neon ide. > >Reporter: Shyam >Priority: Major > > 0 > I am using spark-sql-2.4.1v in my project with java8. > I need to calculate the quantiles on the some of the (calculated) columns > (i.e. con_dist_1 , con_dist_2 ) of below given dataframe df. > {{List calcColmns = Arrays.asList("con_dist_1","con_dist_2")}} > When I am trying to use first version of approxQuantile i.e. > approxQuantile(List, List, double) as below > Dataset df = //dataset > {{List> quants = df.stat().approxQuantile(calcColmns , > Array(0.0,0.1,0.5),0.0);}} > *It is giving error :* > {quote}The method approxQuantile(String, double[], double) in the type > DataFrameStatFunctions is not applicable for the arguments (List, List, > double) > {quote} > so what is wrong here , I am doing it in my eclipseIDE. Why it is not > invoking List even though i am passing List ?? > Really appreciate any help on this. > more details are added here > [https://stackoverflow.com/questions/60550152/issue-with-approxquantile-of-spark-not-recognizing-liststring] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31057) approxQuantile function of spark , not taking List as first parameter
Shyam created SPARK-31057: - Summary: approxQuantile function of spark , not taking List as first parameter Key: SPARK-31057 URL: https://issues.apache.org/jira/browse/SPARK-31057 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.3, 2.4.1 Environment: spark-sql-2.4.3 v and eclipse neon ide. Reporter: Shyam 0 I am using spark-sql-2.4.1v in my project with java8. I need to calculate the quantiles on the some of the (calculated) columns (i.e. con_dist_1 , con_dist_2 ) of below given dataframe df. {{List calcColmns = Arrays.asList("con_dist_1","con_dist_2")}} When I am trying to use first version of approxQuantile i.e. approxQuantile(List, List, double) as below Dataset df = //dataset {{List> quants = df.stat().approxQuantile(calcColmns , Array(0.0,0.1,0.5),0.0);}} *It is giving error :* {quote}The method approxQuantile(String, double[], double) in the type DataFrameStatFunctions is not applicable for the arguments (List, List, double) {quote} so what is wrong here , I am doing it in my eclipseIDE. Why it is not invoking List even though i am passing List ?? Really appreciate any help on this. more details are added here [https://stackoverflow.com/questions/60550152/issue-with-approxquantile-of-spark-not-recognizing-liststring] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3875) Add TEMP DIRECTORY configuration
[ https://issues.apache.org/jira/browse/SPARK-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044467#comment-17044467 ] Shyam commented on SPARK-3875: -- [~srowen] , I am still getting this error , its suggested this is the fix https://issues.apache.org/jira/browse/SPARK-26825 but how to implement it in my spark-streaming application? > Add TEMP DIRECTORY configuration > > > Key: SPARK-3875 > URL: https://issues.apache.org/jira/browse/SPARK-3875 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: Patrick Liu >Priority: Major > > Currently, the Spark uses "java.io.tmpdir" to find the /tmp/ directory. > Then, the /tmp/ directory is used to > 1. Setup the HTTP File Server > 2. Broadcast directory > 3. Fetch Dependency files or jars by Executors > The size of the /tmp/ directory will keep growing. The free space of the > system disk will be less. > I think we could add a configuration "spark.tmp.dir" in conf/spark-env.sh or > conf/spark-defaults.conf to set this particular directory. Let's say, set the > directory to a data disk. > If "spark.tmp.dir" is not set, use the default "java.io.tmpdir" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26825) Spark Structure Streaming job failing when submitted in cluster mode
[ https://issues.apache.org/jira/browse/SPARK-26825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043593#comment-17043593 ] Shyam commented on SPARK-26825: --- [~asdaraujo] [~gsomogyi] I am facing the same issue in spark-sql-2.4.1version , how to replace/overwrite what is returning by this Utils.createTempDir code ? Any more clue. > Spark Structure Streaming job failing when submitted in cluster mode > > > Key: SPARK-26825 > URL: https://issues.apache.org/jira/browse/SPARK-26825 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Andre Araujo >Priority: Major > > I have a structured streaming job that runs successfully when launched in > "client" mode. However, when launched in "cluster" mode it fails with the > following weird messages on the error log. Note that the path in the error > message is actually a local filesystem path that has been mistakenly prefixed > with a {{hdfs://}} scheme. > {code} > 19/02/01 12:53:14 ERROR streaming.StreamMetadata: Error writing stream > metadata StreamMetadata(68f9fb30-5853-49b4-b192-f1e0483e0d95) to > hdfs://ns1/data/yarn/nm/usercache/root/appcache/application_1548823131831_0160/container_1548823131831_0160_02_01/tmp/temporary-3789423a-6ded-4084-aab3-3b6301c34e07/metadataorg.apache.hadoop.security.AccessControlException: > Permission denied: user=root, access=WRITE, > inode="/":hdfs:supergroup:drwxr-xr-x > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:400) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:256) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:194) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1853) > {code} > I dug a little bit into this and here's what I think it's going on: > # When a new streaming query is created, the {{StreamingQueryManager}} > determines the checkpoint location > [here|https://github.com/apache/spark/blob/d811369ce23186cbb3208ad665e15408e13fea87/sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala#L216]. > If neither the user nor the Spark conf specify a checkpoint location, the > location is returned by a call to {{Utils.createTempDir(namePrefix = > s"temporary").getCanonicalPath}}. >Here, I see two issues: > #* The canonical path returned by {{Utils.createTempDir}} does *not* have a > scheme ({{hdfs://}} or {{file://}}), so, it's ambiguous as to what type of > file system the path belongs to. > #* Also note that the path returned by the {{Utils.createTempDir}} call is a > local path, not a HDFS path, as the paths returned by the other two > conditions. I executed {{Utils.createTempDir}} in a test job, both in cluster > and client modes, and the results are these: > {code} > *Client mode:* > java.io.tmpdir=/tmp > createTempDir(namePrefix = s"temporary") => > /tmp/temporary-c51f1466-fd50-40c7-b136-1f2f06672e25 > *Cluster mode:* > java.io.tmpdir=/yarn/nm/usercache/root/appcache/application_154906473_0029/container_154906473_0029_01_01/tmp/ > createTempDir(namePrefix = s"temporary") => > /yarn/nm/usercache/root/appcache/application_154906473_0029/container_154906473_0029_01_01/tmp/temporary-47c13b28-14bd-4d1b-8acc-3e445948415e > {code} > # This temporary checkpoint location is then [passed to the > constructor|https://github.com/apache/spark/blob/d811369ce23186cbb3208ad665e15408e13fea87/sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala#L276] > of the {{MicroBatchExecution}} instance > # This is the point where [{{resolvedCheckpointRoot}} is > calculated|https://github.com/apache/spark/blob/755f9c20761e3db900c6c2b202cd3d2c5bbfb7c0/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L89]. > Here, it's where things start to break: since the path returned by > {{Utils.createTempDir}} doesn't have a scheme, and since HDFS is the default > filesystem, the code resolves the path as being a HDFS path, rather than a > local one, as shown below: > {code} > scala> import org.apache.hadoop.fs.Path > import org.apache.hadoop.fs.Path > scala> // value returned by the Utils.createTempDir method > scala> val checkpointRoot = > "/yarn/nm/usercache/root/appcache/application_154906473_0029/container_154906473_0029_01_01/tmp/temporary-47c13b28-14bd-4d1b-8acc-3e445948415e" > checkpointRoot: String = > /yarn/nm/usercache/root/appcache/application_154906473_0029/container_154906473_0029_01_01/tmp/temporary-47c13b28-14bd-4d1b-8acc-3e445948415e > sca
[jira] [Commented] (SPARK-23829) spark-sql-kafka source in spark 2.3 causes reading stream failure frequently
[ https://issues.apache.org/jira/browse/SPARK-23829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029696#comment-17029696 ] Shyam commented on SPARK-23829: --- [~gsomogyi] me new to kafka and spark , what do you mean spark-vanila version here ? > spark-sql-kafka source in spark 2.3 causes reading stream failure frequently > > > Key: SPARK-23829 > URL: https://issues.apache.org/jira/browse/SPARK-23829 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.3.0 >Reporter: Norman Bai >Priority: Major > Fix For: 2.4.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > In spark 2.3 , it provides a source "spark-sql-kafka-0-10_2.11". > > When I wanted to read from my kafka-0.10.2.1 cluster, it throws out an error > "*java.util.concurrent.TimeoutException: Cannot fetch record for offset > in 12000 milliseconds*" frequently , and the job thus failed. > > I searched on google & stackoverflow for a while, and found many other people > who got this excption too, and nobody gave an answer. > > I debuged the source code, found nothing, but I guess it's because the > dependency spark-sql-kafka-0-10_2.11 is using. > > {code:java} > > org.apache.spark > spark-sql-kafka-0-10_2.11 > 2.3.0 > > > kafka-clients > org.apache.kafka > > > > > org.apache.kafka > kafka-clients > 0.10.2.1 > {code} > I excluded it from maven ,and added another version , rerun the code , and > now it works. > > I guess something is wrong on kafka-clients0.10.0.1 working with > kafka0.10.2.1, or more kafka versions. > > Hope for an explanation. > Here is the error stack. > {code:java} > [ERROR] 2018-03-30 13:34:11,404 [stream execution thread for [id = > 83076cf1-4bf0-4c82-a0b3-23d8432f5964, runId = > b3e18aa6-358f-43f6-a077-e34db0822df6]] > org.apache.spark.sql.execution.streaming.MicroBatchExecution logError - Query > [id = 83076cf1-4bf0-4c82-a0b3-23d8432f5964, runId = > b3e18aa6-358f-43f6-a077-e34db0822df6] terminated with error > org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in > stage 0.0 failed 1 times, most recent failure: Lost task 6.0 in stage 0.0 > (TID 6, localhost, executor driver): java.util.concurrent.TimeoutException: > Cannot fetch record for offset 6481521 in 12 milliseconds > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer.org$apache$spark$sql$kafka010$CachedKafkaConsumer$$fetchData(CachedKafkaConsumer.scala:230) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer$$anonfun$get$1.apply(CachedKafkaConsumer.scala:122) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer$$anonfun$get$1.apply(CachedKafkaConsumer.scala:106) > at > org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer.runUninterruptiblyIfPossible(CachedKafkaConsumer.scala:68) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:106) > at > org.apache.spark.sql.kafka010.KafkaSourceRDD$$anon$1.getNext(KafkaSourceRDD.scala:157) > at > org.apache.spark.sql.kafka010.KafkaSourceRDD$$anon$1.getNext(KafkaSourceRDD.scala:148) > at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) > at > org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:107) > at > org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:105) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun
[jira] [Commented] (SPARK-25466) Documentation does not specify how to set Kafka consumer cache capacity for SS
[ https://issues.apache.org/jira/browse/SPARK-25466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16993557#comment-16993557 ] Shyam commented on SPARK-25466: --- [~gsomogyi] I raised a ticket https://issues.apache.org/jira/browse/SPARK-30222 please let me know if anything specific are you looking for , to suggest a fix. > Documentation does not specify how to set Kafka consumer cache capacity for SS > -- > > Key: SPARK-25466 > URL: https://issues.apache.org/jira/browse/SPARK-25466 > Project: Spark > Issue Type: Improvement > Components: Documentation, Structured Streaming >Affects Versions: 2.3.0 >Reporter: Patrick McGloin >Priority: Minor > > When hitting this warning with SS: > 19-09-2018 12:05:27 WARN CachedKafkaConsumer:66 - KafkaConsumer cache > hitting max capacity of 64, removing consumer for > CacheKey(spark-kafka-source-e06c9676-32c6-49c4-80a9-2d0ac4590609--694285871-executor,MyKafkaTopic-30) > If you Google you get to this page: > https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html > Which is for Spark Streaming and says to use this config item to adjust the > capacity: "spark.streaming.kafka.consumer.cache.maxCapacity". > This is a bit confusing as SS uses a different config item: > "spark.sql.kafkaConsumerCache.capacity" > Perhaps the SS Kafka documentation should talk about the consumer cache > capacity? Perhaps here? > https://spark.apache.org/docs/2.2.0/structured-streaming-kafka-integration.html > Or perhaps the warning message should reference the config item. E.g > 19-09-2018 12:05:27 WARN CachedKafkaConsumer:66 - KafkaConsumer cache > hitting max capacity of 64, removing consumer for > CacheKey(spark-kafka-source-e06c9676-32c6-49c4-80a9-2d0ac4590609--694285871-executor,MyKafkaTopic-30). > *The cache size can be adjusted with the setting > "spark.sql.kafkaConsumerCache.capacity".* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30222) Still getting KafkaConsumer cache hitting max capacity of 64, removing consumer for CacheKe
Shyam created SPARK-30222: - Summary: Still getting KafkaConsumer cache hitting max capacity of 64, removing consumer for CacheKe Key: SPARK-30222 URL: https://issues.apache.org/jira/browse/SPARK-30222 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.4.1 Environment: {{Below are the logs.}} 2019-12-11 08:33:31,504 [Executor task launch worker for task 1050] WARN org.apache.spark.sql.kafka010.KafkaDataConsumer - KafkaConsumer cache hitting max capacity of 64, removing consumer for CacheKey(spark-kafka-source-93ee3689-79f9-42e8-b1ee-e856570205ae-1923743483-executor,COMPANY_TRANSACTIONS_INBOUND-21) 2019-12-11 08:33:32,493 [Executor task launch worker for task 1051] WARN org.apache.spark.sql.kafka010.KafkaDataConsumer - KafkaConsumer cache hitting max capacity of 64, removing consumer for CacheKey(spark-kafka-source-93ee3689-79f9-42e8-b1ee-e856570205ae-1923743483-executor,COMPANY_TRANSACTIONS_INBOUND-5) 2019-12-11 08:33:32,570 [Executor task launch worker for task 1052] WARN org.apache.spark.sql.kafka010.KafkaDataConsumer - KafkaConsumer cache hitting max capacity of 64, removing consumer for CacheKey(spark-kafka-source-93ee3689-79f9-42e8-b1ee-e856570205ae-1923743483-executor,COMPANY_TRANSACTIONS_INBOUND-9) 2019-12-11 08:33:33,441 [Executor task launch worker for task 1053] WARN org.apache.spark.sql.kafka010.KafkaDataConsumer - KafkaConsumer cache hitting max capacity of 64, removing consumer for CacheKey(spark-kafka-source-93ee3689-79f9-42e8-b1ee-e856570205ae-1923743483-executor,COMPANY_TRANSACTIONS_INBOUND-17) 2019-12-11 08:33:33,619 [Executor task launch worker for task 1054] WARN org.apache.spark.sql.kafka010.KafkaDataConsumer - KafkaConsumer cache hitting max capacity of 64, removing consumer for CacheKey(spark-kafka-source-93ee3689-79f9-42e8-b1ee-e856570205ae-1923743483-executor,COMPANY_TRANSACTIONS_INBOUND-2) 2019-12-11 08:33:34,474 [Executor task launch worker for task 1055] WARN org.apache.spark.sql.kafka010.KafkaDataConsumer - KafkaConsumer cache hitting max capacity of 64, removing consumer for CacheKey(spark-kafka-source-93ee3689-79f9-42e8-b1ee-e856570205ae-1923743483-executor,COMPANY_TRANSACTIONS_INBOUND-10) 2019-12-11 08:33:35,006 [Executor task launch worker for task 1056] WARN org.apache.spark.sql.kafka010.KafkaDataConsumer - KafkaConsumer cache hitting max capacity of 64, removing consumer for CacheKey(spark-kafka-source-93ee3689-79f9-42e8-b1ee-e856570205ae-1923743483-executor,COMPANY_TRANSACTIONS_INBOUND-6) 2019-12-11 08:33:36,326 [Executor task launch worker for task 1057] WARN org.apache.spark.sql.kafka010.KafkaDataConsumer - KafkaConsumer cache hitting max capacity of 64, removing consumer for CacheKey(spark-kafka-source-93ee3689-79f9-42e8-b1ee-e856570205ae-1923743483-executor,COMPANY_TRANSACTIONS_INBOUND-14) 2019-12-11 08:33:36,634 [Executor task launch worker for task 1058] WARN org.apache.spark.sql.kafka010.KafkaDataConsumer - KafkaConsumer cache hitting max capacity of 64, removing consumer for CacheKey(spark-kafka-source-93ee3689-79f9-42e8-b1ee-e856570205ae-1923743483-executor,COMPANY_TRANSACTIONS_INBOUND-0) 2019-12-11 08:33:37,496 [Executor task launch worker for task 1059] WARN org.apache.spark.sql.kafka010.KafkaDataConsumer - KafkaConsumer cache hitting max capacity of 64, removing consumer for CacheKey(spark-kafka-source-93ee3689-79f9-42e8-b1ee-e856570205ae-1923743483-executor,COMPANY_TRANSACTIONS_INBOUND-19) 2019-12-11 08:33:39,183 [stream execution thread for [id = b3aec196-e4f2-4ef9-973b-d5685eba917e, runId = 5c35a63a-16ad-4899-b732-1019397770bd]] WARN org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor - Current batch is falling behind. The trigger interval is 15000 milliseconds, but spent 63438 milliseconds Reporter: Shyam Fix For: 2.4.1 Me using spark-sql-2.4.1 version with Kafka 0.10 v. While I try to consume data by consumer. it gives error below even after setting .option("spark.sql.kafkaConsumerCache.capacity",128) {{Dataset df = sparkSession}} {{ .readStream()}} {{ .format("kafka")}} {{ .option("kafka.bootstrap.servers", SERVERS)}} {{ .option("subscribe", TOPIC) }}{{}} {{ .option("spark.sql.kafkaConsumerCache.capacity",128) }} {{ }} {{ .load();}} {{}} {{}} {{}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25466) Documentation does not specify how to set Kafka consumer cache capacity for SS
[ https://issues.apache.org/jira/browse/SPARK-25466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968435#comment-16968435 ] Shyam commented on SPARK-25466: --- [~gsomogyi] I am still facing the same issue how to fix it ? I tried these things mentioned here in SOF [https://stackoverflow.com/questions/58456939/how-to-set-spark-consumer-cache-to-fix-kafkaconsumer-cache-hitting-max-capaci] > Documentation does not specify how to set Kafka consumer cache capacity for SS > -- > > Key: SPARK-25466 > URL: https://issues.apache.org/jira/browse/SPARK-25466 > Project: Spark > Issue Type: Improvement > Components: Documentation, Structured Streaming >Affects Versions: 2.3.0 >Reporter: Patrick McGloin >Priority: Minor > > When hitting this warning with SS: > 19-09-2018 12:05:27 WARN CachedKafkaConsumer:66 - KafkaConsumer cache > hitting max capacity of 64, removing consumer for > CacheKey(spark-kafka-source-e06c9676-32c6-49c4-80a9-2d0ac4590609--694285871-executor,MyKafkaTopic-30) > If you Google you get to this page: > https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html > Which is for Spark Streaming and says to use this config item to adjust the > capacity: "spark.streaming.kafka.consumer.cache.maxCapacity". > This is a bit confusing as SS uses a different config item: > "spark.sql.kafkaConsumerCache.capacity" > Perhaps the SS Kafka documentation should talk about the consumer cache > capacity? Perhaps here? > https://spark.apache.org/docs/2.2.0/structured-streaming-kafka-integration.html > Or perhaps the warning message should reference the config item. E.g > 19-09-2018 12:05:27 WARN CachedKafkaConsumer:66 - KafkaConsumer cache > hitting max capacity of 64, removing consumer for > CacheKey(spark-kafka-source-e06c9676-32c6-49c4-80a9-2d0ac4590609--694285871-executor,MyKafkaTopic-30). > *The cache size can be adjusted with the setting > "spark.sql.kafkaConsumerCache.capacity".* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29710) Seeing offsets not resetting even when reset policy is configured explicitly
[ https://issues.apache.org/jira/browse/SPARK-29710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964696#comment-16964696 ] Shyam commented on SPARK-29710: --- @Gabor Somogyi Can you please help me , what is wrong here > Seeing offsets not resetting even when reset policy is configured explicitly > > > Key: SPARK-29710 > URL: https://issues.apache.org/jira/browse/SPARK-29710 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.1 > Environment: Window10 , eclipse neos >Reporter: Shyam >Priority: Major > > > even after setting *"auto.offset.reset" to "latest"* I am getting below > error > > org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of > range with no configured reset policy for partitions: > \{COMPANY_TRANSACTIONS_INBOUND-16=168}org.apache.kafka.clients.consumer.OffsetOutOfRangeException: > Offsets out of range with no configured reset policy for partitions: > \{COMPANY_TRANSACTIONS_INBOUND-16=168} at > org.apache.kafka.clients.consumer.internals.Fetcher.throwIfOffsetOutOfRange(Fetcher.java:348) > at > org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:396) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:999) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937) > at > org.apache.spark.sql.kafka010.InternalKafkaConsumer.fetchData(KafkaDataConsumer.scala:470) > at > org.apache.spark.sql.kafka010.InternalKafkaConsumer.org$apache$spark$sql$kafka010$InternalKafkaConsumer$$fetchRecord(KafkaDataConsumer.scala:361) > at > org.apache.spark.sql.kafka010.InternalKafkaConsumer$$anonfun$get$1.apply(KafkaDataConsumer.scala:251) > at > org.apache.spark.sql.kafka010.InternalKafkaConsumer$$anonfun$get$1.apply(KafkaDataConsumer.scala:234) > at > org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77) > at > org.apache.spark.sql.kafka010.InternalKafkaConsumer.runUninterruptiblyIfPossible(KafkaDataConsumer.scala:209) > at > org.apache.spark.sql.kafka010.InternalKafkaConsumer.get(KafkaDataConsumer.scala:234) > > [https://stackoverflow.com/questions/58653885/even-after-setting-auto-offset-reset-to-latest-getting-error-offsetoutofrang] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29710) Seeing offsets not resetting even when reset policy is configured explicitly
Shyam created SPARK-29710: - Summary: Seeing offsets not resetting even when reset policy is configured explicitly Key: SPARK-29710 URL: https://issues.apache.org/jira/browse/SPARK-29710 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.4.1 Environment: Window10 , eclipse neos Reporter: Shyam even after setting *"auto.offset.reset" to "latest"* I am getting below error org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: \{COMPANY_TRANSACTIONS_INBOUND-16=168}org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: \{COMPANY_TRANSACTIONS_INBOUND-16=168} at org.apache.kafka.clients.consumer.internals.Fetcher.throwIfOffsetOutOfRange(Fetcher.java:348) at org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:396) at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:999) at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937) at org.apache.spark.sql.kafka010.InternalKafkaConsumer.fetchData(KafkaDataConsumer.scala:470) at org.apache.spark.sql.kafka010.InternalKafkaConsumer.org$apache$spark$sql$kafka010$InternalKafkaConsumer$$fetchRecord(KafkaDataConsumer.scala:361) at org.apache.spark.sql.kafka010.InternalKafkaConsumer$$anonfun$get$1.apply(KafkaDataConsumer.scala:251) at org.apache.spark.sql.kafka010.InternalKafkaConsumer$$anonfun$get$1.apply(KafkaDataConsumer.scala:234) at org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77) at org.apache.spark.sql.kafka010.InternalKafkaConsumer.runUninterruptiblyIfPossible(KafkaDataConsumer.scala:209) at org.apache.spark.sql.kafka010.InternalKafkaConsumer.get(KafkaDataConsumer.scala:234) [https://stackoverflow.com/questions/58653885/even-after-setting-auto-offset-reset-to-latest-getting-error-offsetoutofrang] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22000) org.codehaus.commons.compiler.CompileException: toString method is not declared
[ https://issues.apache.org/jira/browse/SPARK-22000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961159#comment-16961159 ] Shyam commented on SPARK-22000: --- [~srowen] I am using spark.2.4.1 version getting same error ... for more details please check this , [https://stackoverflow.com/questions/58593215/inserting-into-cassandra-table-from-spark-dataframe-results-in-org-codehaus-comm] how to fix this ? > org.codehaus.commons.compiler.CompileException: toString method is not > declared > --- > > Key: SPARK-22000 > URL: https://issues.apache.org/jira/browse/SPARK-22000 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: taiho choi >Assignee: Jungtaek Lim >Priority: Major > Fix For: 3.0.0 > > Attachments: testcase.zip > > > the error message say that toString is not declared on "value13" which is > "long" type in generated code. > i think value13 should be Long type. > ==error message > Caused by: org.codehaus.commons.compiler.CompileException: File > 'generated.java', Line 70, Column 32: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 70, Column 32: A method named "toString" is not declared in any enclosing > class nor any supertype, nor through a static import > /* 033 */ private void apply1_2(InternalRow i) { > /* 034 */ > /* 035 */ > /* 036 */ boolean isNull11 = i.isNullAt(1); > /* 037 */ UTF8String value11 = isNull11 ? null : (i.getUTF8String(1)); > /* 038 */ boolean isNull10 = true; > /* 039 */ java.lang.String value10 = null; > /* 040 */ if (!isNull11) { > /* 041 */ > /* 042 */ isNull10 = false; > /* 043 */ if (!isNull10) { > /* 044 */ > /* 045 */ Object funcResult4 = null; > /* 046 */ funcResult4 = value11.toString(); > /* 047 */ > /* 048 */ if (funcResult4 != null) { > /* 049 */ value10 = (java.lang.String) funcResult4; > /* 050 */ } else { > /* 051 */ isNull10 = true; > /* 052 */ } > /* 053 */ > /* 054 */ > /* 055 */ } > /* 056 */ } > /* 057 */ javaBean.setApp(value10); > /* 058 */ > /* 059 */ > /* 060 */ boolean isNull13 = i.isNullAt(12); > /* 061 */ long value13 = isNull13 ? -1L : (i.getLong(12)); > /* 062 */ boolean isNull12 = true; > /* 063 */ java.lang.String value12 = null; > /* 064 */ if (!isNull13) { > /* 065 */ > /* 066 */ isNull12 = false; > /* 067 */ if (!isNull12) { > /* 068 */ > /* 069 */ Object funcResult5 = null; > /* 070 */ funcResult5 = value13.toString(); > /* 071 */ > /* 072 */ if (funcResult5 != null) { > /* 073 */ value12 = (java.lang.String) funcResult5; > /* 074 */ } else { > /* 075 */ isNull12 = true; > /* 076 */ } > /* 077 */ > /* 078 */ > /* 079 */ } > /* 080 */ } > /* 081 */ javaBean.setReasonCode(value12); > /* 082 */ > /* 083 */ } -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10420) Implementing Reactive Streams based Spark Streaming Receiver
[ https://issues.apache.org/jira/browse/SPARK-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157001#comment-15157001 ] Shyam commented on SPARK-10420: --- Hi all, is there any activity around this ticket? > Implementing Reactive Streams based Spark Streaming Receiver > > > Key: SPARK-10420 > URL: https://issues.apache.org/jira/browse/SPARK-10420 > Project: Spark > Issue Type: Improvement > Components: Streaming >Reporter: Nilanjan Raychaudhuri >Priority: Minor > > Hello TD, > This is probably the last bit of the back-pressure story, implementing > ReactiveStreams based Spark streaming receivers. After discussing about this > with my Typesafe team we came up with the following design document > https://docs.google.com/document/d/1lGQKXfNznd5SPuQigvCdLsudl-gcvWKuHWr0Bpn3y30/edit?usp=sharing > Could you please take a look at this when you get a chance? > Thanks > Nilanjan -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4354) 14/11/12 09:39:00 WARN TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5, HYD-RNDNW-VFRCO-RCORE2): java.lang.NoClassDefFoundError: Could not initialize class org.xerial
[ https://issues.apache.org/jira/browse/SPARK-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213375#comment-14213375 ] Shyam commented on SPARK-4354: -- Will somebody please enlighten me on the mentioned exception and how to get rid of it? Thank you. > 14/11/12 09:39:00 WARN TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5, > HYD-RNDNW-VFRCO-RCORE2): java.lang.NoClassDefFoundError: Could not initialize > class org.xerial.snappy.Snappy > -- > > Key: SPARK-4354 > URL: https://issues.apache.org/jira/browse/SPARK-4354 > Project: Spark > Issue Type: Question > Components: Examples >Affects Versions: 1.1.0 > Environment: Linux >Reporter: Shyam > Labels: newbie > Attachments: client-exception.txt > > > Prebuilt Spark for Hadoop 2.4 installed in 4 redhat linux machines > Standalone cluster mode. > Machine 1(Master) > Machine 2(Worker node 1) > Machine 3(Worker node 2) > Machine 4(Client for executing spark examples) > I ran below mentioned command in Machine 4 then got exception mentioned in > the summary of this issue. > sh spark-submit --class org.apache.spark.examples.SparkPi --jars > /FS/lib/spark-assembly-1.1.0-hadoop2.4.0.jar --master spark://MasterIP:7077 > --deploy-mode client /FS/lib/spark-examples-1.1.0-hadoop2.4.0.jar 10 > java.lang.NoClassDefFoundError: Could not initialize class > org.xerial.snappy.Snappy -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4354) 14/11/12 09:39:00 WARN TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5, HYD-RNDNW-VFRCO-RCORE2): java.lang.NoClassDefFoundError: Could not initialize class org.xerial.s
[ https://issues.apache.org/jira/browse/SPARK-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shyam updated SPARK-4354: - Attachment: client-exception.txt The process of execution and mentioned exception can be found in the text file attached. > 14/11/12 09:39:00 WARN TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5, > HYD-RNDNW-VFRCO-RCORE2): java.lang.NoClassDefFoundError: Could not initialize > class org.xerial.snappy.Snappy > -- > > Key: SPARK-4354 > URL: https://issues.apache.org/jira/browse/SPARK-4354 > Project: Spark > Issue Type: Question > Components: Examples >Affects Versions: 1.1.0 > Environment: Linux >Reporter: Shyam > Labels: newbie > Attachments: client-exception.txt > > > Prebuilt Spark for Hadoop 2.4 installed in 4 redhat linux machines > Standalone cluster mode. > Machine 1(Master) > Machine 2(Worker node 1) > Machine 3(Worker node 2) > Machine 4(Client for executing spark examples) > I ran below mentioned command in Machine 4 then got exception mentioned in > the summary of this issue. > sh spark-submit --class org.apache.spark.examples.SparkPi --jars > /FS/lib/spark-assembly-1.1.0-hadoop2.4.0.jar --master spark://MasterIP:7077 > --deploy-mode client /FS/lib/spark-examples-1.1.0-hadoop2.4.0.jar 10 > java.lang.NoClassDefFoundError: Could not initialize class > org.xerial.snappy.Snappy -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4354) 14/11/12 09:39:00 WARN TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5, HYD-RNDNW-VFRCO-RCORE2): java.lang.NoClassDefFoundError: Could not initialize class org.xerial.s
Shyam created SPARK-4354: Summary: 14/11/12 09:39:00 WARN TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5, HYD-RNDNW-VFRCO-RCORE2): java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy Key: SPARK-4354 URL: https://issues.apache.org/jira/browse/SPARK-4354 Project: Spark Issue Type: Question Components: Examples Affects Versions: 1.1.0 Environment: Linux Reporter: Shyam Prebuilt Spark for Hadoop 2.4 installed in 4 redhat linux machines Standalone cluster mode. Machine 1(Master) Machine 2(Worker node 1) Machine 3(Worker node 2) Machine 4(Client for executing spark examples) I ran below mentioned command in Machine 4 then got exception mentioned in the summary of this issue. sh spark-submit --class org.apache.spark.examples.SparkPi --jars /FS/lib/spark-assembly-1.1.0-hadoop2.4.0.jar --master spark://MasterIP:7077 --deploy-mode client /FS/lib/spark-examples-1.1.0-hadoop2.4.0.jar 10 java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org