(Spark-ml) java.util.NosuchElementException: key not found exception on doing prediction and computing test error.
Thanks. Its working now. My test data had some labels which were not there in training set. On Wednesday, June 28, 2017, Pralabh Kumar > wrote: > Hi Neha > > This generally occurred when , you training data set have some value of > categorical variable ,which in not there in your testing data. For e.g you > have column DAYS ,with value M,T,W in training data . But when your test > data contains F ,then it say no key found exception . Please look into > this , and if that's not the case ,then Could you please share your code > ,and training/testing data for better understanding. > > Regards > Pralabh Kumar > > On Wed, Jun 28, 2017 at 11:45 AM, neha nihal > wrote: > >> >> Hi, >> >> I am using Apache spark 2.0.2 randomforest ml (standalone mode) for text >> classification. TF-IDF feature extractor is also used. The training part >> runs without any issues and returns 100% accuracy. But when I am trying to >> do prediction using trained model and compute test error, it fails with >> java.util.NosuchElementException: key not found exception. >> Any help will be much appreciated. >> >> Thanks >> >> >
Re: (Spark-ml) java.util.NosuchElementException: key not found exception on doing prediction and computing test error.
Hi Neha This generally occurred when , you training data set have some value of categorical variable ,which in not there in your testing data. For e.g you have column DAYS ,with value M,T,W in training data . But when your test data contains F ,then it say no key found exception . Please look into this , and if that's not the case ,then Could you please share your code ,and training/testing data for better understanding. Regards Pralabh Kumar On Wed, Jun 28, 2017 at 11:45 AM, neha nihal wrote: > > Hi, > > I am using Apache spark 2.0.2 randomforest ml (standalone mode) for text > classification. TF-IDF feature extractor is also used. The training part > runs without any issues and returns 100% accuracy. But when I am trying to > do prediction using trained model and compute test error, it fails with > java.util.NosuchElementException: key not found exception. > Any help will be much appreciated. > > Thanks > >
Fwd: (Spark-ml) java.util.NosuchElementException: key not found exception on doing prediction and computing test error.
Hi, I am using Apache spark 2.0.2 randomforest ml (standalone mode) for text classification. TF-IDF feature extractor is also used. The training part runs without any issues and returns 100% accuracy. But when I am trying to do prediction using trained model and compute test error, it fails with java.util.NosuchElementException: key not found exception. Any help will be much appreciated. Thanks
(Spark-ml) java.util.NosuchElementException: key not found exception on doing prediction and computing test error.
Hi, I am using Apache spark 2.0.2 randomforest ml (standalone mode) for text classification. TF-IDF feature extractor is also used. The training part runs without any issues and returns 100% accuracy. But when I am trying to do prediction using trained model and compute test error, it fails with java.util.NosuchElementException: key not found exception. Any help will be much appreciated. Thanks & Regards
Re: java.util.NoSuchElementException: key not found error
This is https://issues.apache.org/jira/browse/SPARK-10422, which has been fixed in Spark 1.5.1. On Wed, Oct 21, 2015 at 4:40 PM, Sourav Mazumder < sourav.mazumde...@gmail.com> wrote: > In 1.5.0 if I use randomSplit on a data frame I get this error. > > Here is teh code snippet - > > val splitData = merged.randomSplit(Array(70,30)) > val trainData = splitData(0).persist() > val testData = splitData(1) > > trainData.registerTempTable("trn") > > %sql select * from trn > > The exception goes like this - > > java.util.NoSuchElementException: key not found: 1910 at > scala.collection.MapLike$class.default(MapLike.scala:228) at > scala.collection.AbstractMap.default(Map.scala:58) at > scala.collection.mutable.HashMap.apply(HashMap.scala:64) at > org.apache.spark.sql.columnar.compression.DictionaryEncoding$Encoder.compress(compressionSchemes.scala:258) > at > org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.build(CompressibleColumnBuilder.scala:110) > at > org.apache.spark.sql.columnar.NativeColumnBuilder.build(ColumnBuilder.scala:87) > at > org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152) > at > org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at > scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at > org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:152) > at > org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:120) > at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) > at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at > org.apache.spark.scheduler.Task.run(Task.scala:88) at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > > Any idea ? > > regards, > Sourav >
java.util.NoSuchElementException: key not found error
In 1.5.0 if I use randomSplit on a data frame I get this error. Here is teh code snippet - val splitData = merged.randomSplit(Array(70,30)) val trainData = splitData(0).persist() val testData = splitData(1) trainData.registerTempTable("trn") %sql select * from trn The exception goes like this - java.util.NoSuchElementException: key not found: 1910 at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at scala.collection.mutable.HashMap.apply(HashMap.scala:64) at org.apache.spark.sql.columnar.compression.DictionaryEncoding$Encoder.compress(compressionSchemes.scala:258) at org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.build(CompressibleColumnBuilder.scala:110) at org.apache.spark.sql.columnar.NativeColumnBuilder.build(ColumnBuilder.scala:87) at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152) at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:152) at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:120) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Any idea ? regards, Sourav
Re: calling persist would cause java.util.NoSuchElementException: key not found:
Do you have the full stack trace? Could you check if it's same as https://issues.apache.org/jira/browse/SPARK-10422 Best Regards, Shixiong Zhu 2015-10-01 17:05 GMT+08:00 Eyad Sibai : > Hi > > I am trying to call .persist() on a dataframe but once I execute the next > line I am getting > java.util.NoSuchElementException: key not found: …. > > I tried to do persist on disk also the same thing. > > I am using: > pyspark with python3 > spark 1.5 > > > Thanks! > > > EYAD SIBAI > Risk Engineer > > *iZettle ®* > –– > > Mobile: +46 72 911 60 54 <+46%2072%20911%2060%2054> > Web: www.izettle.com <http://izettle.com/> >
calling persist would cause java.util.NoSuchElementException: key not found:
Hi I am trying to call .persist() on a dataframe but once I execute the next line I am getting java.util.NoSuchElementException: key not found: …. I tried to do persist on disk also the same thing. I am using: pyspark with python3 spark 1.5 Thanks! EYAD SIBAI Risk Engineer iZettle ® –– Mobile: +46 72 911 60 54 Web: www.izettle.com
Re: Re: java.util.NoSuchElementException: key not found
Thank u very much ! when will the Spark 1.5.1 come out. guoqing0...@yahoo.com.hk From: Yin Huai Date: 2015-09-12 04:49 To: guoqing0...@yahoo.com.hk CC: user Subject: Re: java.util.NoSuchElementException: key not found Looks like you hit https://issues.apache.org/jira/browse/SPARK-10422, it has been fixed in branch 1.5. 1.5.1 release will have it. On Fri, Sep 11, 2015 at 3:35 AM, guoqing0...@yahoo.com.hk wrote: Hi all , After upgrade spark to 1.5 , Streaming throw java.util.NoSuchElementException: key not found occasionally , is the problem of data cause this error ? please help me if anyone got similar problem before , Thanks very much. the exception accur when write into database. org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 76, slave2): java.util.NoSuchElementException: key not found: ruixue.sys.session.request at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at scala.collection.mutable.HashMap.apply(HashMap.scala:64) at org.apache.spark.sql.columnar.compression.DictionaryEncoding$Encoder.compress(compressionSchemes.scala:258) at org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.build(CompressibleColumnBuilder.scala:110) at org.apache.spark.sql.columnar.NativeColumnBuilder.build(ColumnBuilder.scala:87) at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152) at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:152) at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:120) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) guoqing0...@yahoo.com.hk
Re: java.util.NoSuchElementException: key not found
Looks like you hit https://issues.apache.org/jira/browse/SPARK-10422, it has been fixed in branch 1.5. 1.5.1 release will have it. On Fri, Sep 11, 2015 at 3:35 AM, guoqing0...@yahoo.com.hk < guoqing0...@yahoo.com.hk> wrote: > Hi all , > After upgrade spark to 1.5 , Streaming throw > java.util.NoSuchElementException: key not found occasionally , is the > problem of data cause this error ? please help me if anyone got similar > problem before , Thanks very much. > > the exception accur when write into database. > > > > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 > (TID 76, slave2): java.util.NoSuchElementException: key not found: > ruixue.sys.session.request > at scala.collection.MapLike$class.default(MapLike.scala:228) > at scala.collection.AbstractMap.default(Map.scala:58) > at scala.collection.mutable.HashMap.apply(HashMap.scala:64) > > at > org.apache.spark.sql.columnar.compression.DictionaryEncoding$Encoder.compress(compressionSchemes.scala:258) > > at > org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.build(CompressibleColumnBuilder.scala:110) > > at > org.apache.spark.sql.columnar.NativeColumnBuilder.build(ColumnBuilder.scala:87) > > at > org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152) > > at > org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152) > > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) > > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) > > at > org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:152) > > at > org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:120) > > at > org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) > > at > org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) > > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) > > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > > -- > guoqing0...@yahoo.com.hk >
java.util.NoSuchElementException: key not found
Hi all , After upgrade spark to 1.5 , Streaming throw java.util.NoSuchElementException: key not found occasionally , is the problem of data cause this error ? please help me if anyone got similar problem before , Thanks very much. the exception accur when write into database. org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 76, slave2): java.util.NoSuchElementException: key not found: ruixue.sys.session.request at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at scala.collection.mutable.HashMap.apply(HashMap.scala:64) at org.apache.spark.sql.columnar.compression.DictionaryEncoding$Encoder.compress(compressionSchemes.scala:258) at org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.build(CompressibleColumnBuilder.scala:110) at org.apache.spark.sql.columnar.NativeColumnBuilder.build(ColumnBuilder.scala:87) at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152) at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1$$anonfun$next$2.apply(InMemoryColumnarTableScan.scala:152) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:152) at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:120) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) guoqing0...@yahoo.com.hk
RE: SparkSql - java.util.NoSuchElementException: key not found: node when access JSON Array
You can use the HiveContext instead of SQLContext, which should support all the HiveQL, including lateral view explode. SQLContext is not supporting that yet. BTW, nice coding format in the email. Yong Date: Tue, 31 Mar 2015 18:18:19 -0400 Subject: Re: SparkSql - java.util.NoSuchElementException: key not found: node when access JSON Array From: tsind...@gmail.com To: user@spark.apache.org So in looking at this a bit more, I gather the root cause is the fact that the nested fields are represented as rows within rows, is that correct? If I don't know the size of the json array (it varies), using x.getAs[Row](0).getString(0) is not really a valid solution. Is the solution to apply a lateral view + explode to this? I have attempted to change to a lateral view, but looks like my syntax is off: sqlContext.sql( "SELECT path,`timestamp`, name, value, pe.value FROM metric lateral view explode(pathElements) a AS pe") .collect.foreach(println(_)) Which results in: 15/03/31 17:38:34 INFO ContextCleaner: Cleaned broadcast 0 Exception in thread "main" java.lang.RuntimeException: [1.68] failure: ``UNION'' expected but identifier view found SELECT path,`timestamp`, name, value, pe.value FROM metric lateral view explode(pathElements) a AS pe ^ at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(SparkSQLParser.scala:33) at org.apache.spark.sql.SQLContext$$anonfun$1.apply(SQLContext.scala:79) at org.apache.spark.sql.SQLContext$$anonfun$1.apply(SQLContext.scala:79) at org.apache.spark.sql.catalyst.SparkSQLParser$$anonfun$org$apache$spark$sql$catalyst$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:174) at org.apache.spark.sql.catalyst.SparkSQLParser$$anonfun$org$apache$spark$sql$catalyst$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:173) at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136) at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891) at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890) at scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110) at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(SparkSQLParser.scala:31) at org.apache.spark.sql.SQLContext$$anonfun$parseSql$1.apply(SQLContext.scala:83) at org.apache.spark.sql.SQLContext$$anonfun$parseSql$1.apply(SQLContext.scala:83) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:83) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:303) at com.opsdatastore.elasticsearch.spark.ElasticSearchReadWrite$.main(ElasticSearchReadWrite.scala:97) at com.opsdatastore.elasticsearch.spark.ElasticSearchReadWrite.main(ElasticSearchReadWrite.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Is this the right approach? Is this syntax available in 1.2.1: SELECT v1.name, v2.city, v2.state FROM people LATERAL VIEW json_tuple(people.jsonObject, 'name', 'address') v1 as name, address LATERAL VIEW json_tuple(v1.address, 'city', 'state') v2 as city, state; -Todd On Tue, Mar 31, 2015 at 3:26 PM, Todd
Re: SparkSql - java.util.NoSuchElementException: key not found: node when access JSON Array
D(*"*device/metric") > esData.collect.foreach(println(_)) > > And results in this: > > 15/03/31 14:37:48 INFO DAGScheduler: Job 0 finished: collect at > ElasticSearchReadWrite.scala:67, took 4.948556 s > (AUxxDrs4cgadF5SlaMg0,Map(pathElements -> Buffer(Map(node -> State, value -> > PA), Map(node -> City, value -> Pittsburgh), Map(node -> Street, value -> > 12345 Westbrook Drive), Map(node -> level, value -> main), Map(node -> > device, value -> thermostat)), value -> 29.590943279257175, name -> Current > Temperature, timestamp -> 2015-03-27T14:53:46+, path -> > /PA/Pittsburgh/12345 Westbrook Drive/main/theromostat-1)) > > Yet this fails: > > sqlContext.sql("SELECT path, pathElements, `timestamp`, name, value FROM > metric").collect.foreach(println(_)) > > With this exception: > > Create Metric Temporary Table for > querying# Scheam Definition > # > root > # Data from SparkSQL > #15/03/31 14:37:49 INFO > BlockManager: Removing broadcast 015/03/31 14:37:49 INFO BlockManager: > Removing block broadcast_015/03/31 14:37:49 INFO MemoryStore: Block > broadcast_0 of size 1264 dropped from memory (free 278018576)15/03/31 > 14:37:49 INFO BlockManager: Removing block broadcast_0_piece015/03/31 > 14:37:49 INFO MemoryStore: Block broadcast_0_piece0 of size 864 dropped from > memory (free 278019440)15/03/31 14:37:49 INFO BlockManagerInfo: Removed > broadcast_0_piece0 on 192.168.1.5:57820 in memory (size: 864.0 B, free: 265.1 > MB)15/03/31 14:37:49 INFO BlockManagerMaster: Updated info of block > broadcast_0_piece015/03/31 14:37:49 INFO BlockManagerInfo: Removed > broadcast_0_piece0 on 192.168.1.5:57834 in memory (size: 864.0 B, free: 530.0 > MB)15/03/31 14:37:49 INFO ContextCleaner: Cleaned broadcast 015/03/31 > 14:37:49 INFO ScalaEsRowRDD: Reading from [device/metric]15/03/31 14:37:49 > INFO ScalaEsRowRDD: Discovered mapping > {device=[mappings=[metric=[name=STRING, path=STRING, > pathElements=[node=STRING, value=STRING], pathId=STRING, timestamp=DATE, > value=DOUBLE]]]} for [device/metric]15/03/31 14:37:49 INFO SparkContext: > Starting job: collect at SparkPlan.scala:8415/03/31 14:37:49 INFO > DAGScheduler: Got job 1 (collect at SparkPlan.scala:84) with 1 output > partitions (allowLocal=false)15/03/31 14:37:49 INFO DAGScheduler: Final > stage: Stage 1(collect at SparkPlan.scala:84)15/03/31 14:37:49 INFO > DAGScheduler: Parents of final stage: List()15/03/31 14:37:49 INFO > DAGScheduler: Missing parents: List()15/03/31 14:37:49 INFO DAGScheduler: > Submitting Stage 1 (MappedRDD[6] at map at SparkPlan.scala:84), which has no > missing parents15/03/31 14:37:49 INFO MemoryStore: ensureFreeSpace(4120) > called with curMem=0, maxMem=27801944015/03/31 14:37:49 INFO MemoryStore: > Block broadcast_1 stored as values in memory (estimated size 4.0 KB, free > 265.1 MB)15/03/31 14:37:49 INFO MemoryStore: ensureFreeSpace(2403) called > with curMem=4120, maxMem=27801944015/03/31 14:37:49 INFO MemoryStore: Block > broadcast_1_piece0 stored as bytes in memory (estimated size 2.3 KB, free > 265.1 MB)15/03/31 14:37:49 INFO BlockManagerInfo: Added broadcast_1_piece0 in > memory on 192.168.1.5:57820 (size: 2.3 KB, free: 265.1 MB)15/03/31 14:37:49 > INFO BlockManagerMaster: Updated info of block broadcast_1_piece015/03/31 > 14:37:49 INFO SparkContext: Created broadcast 1 from broadcast at > DAGScheduler.scala:83815/03/31 14:37:49 INFO DAGScheduler: Submitting 1 > missing tasks from Stage 1 (MappedRDD[6] at map at > SparkPlan.scala:84)15/03/31 14:37:49 INFO TaskSchedulerImpl: Adding task set > 1.0 with 1 tasks15/03/31 14:37:49 INFO TaskSetManager: Starting task 0.0 in > stage 1.0 (TID 1, 192.168.1.5, NODE_LOCAL, 3731 bytes)15/03/31 14:37:50 INFO > BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.1.5:57836 > (size: 2.3 KB, free: 530.0 MB)15/03/31 14:37:52 WARN TaskSetManager: Lost > task 0.0 in stage 1.0 (TID 1, 192.168.1.5): java.util.NoSuchElementException: > key not found: node > at scala.collection.MapLike$class.default(MapLike.scala:228) > at scala.collection.AbstractMap.default(Map.scala:58) > at scala.collection.MapLike$class.apply(MapLike.scala:141) > at scala.collection.AbstractMap.apply(Map.scala:58) > at > org.elasticsearch.spark.sql.RowValueReader$class.addToBuffer(RowValueReader.scala:32) > at > org.elasticsearch.spark.sql.ScalaRowValueReader.addToBuffer(ScalaRowValueReader.scala:9) > at > org.elasticsearch.spark.sql.Sc
SparkSql - java.util.NoSuchElementException: key not found: node when access JSON Array
Map(node -> Street, value -> 12345 Westbrook Drive), Map(node -> level, value -> main), Map(node -> device, value -> thermostat)), value -> 29.590943279257175, name -> Current Temperature, timestamp -> 2015-03-27T14:53:46+, path -> /PA/Pittsburgh/12345 Westbrook Drive/main/theromostat-1)) Yet this fails: sqlContext.sql("SELECT path, pathElements, `timestamp`, name, value FROM metric").collect.foreach(println(_)) With this exception: Create Metric Temporary Table for querying# Scheam Definition # root # Data from SparkSQL #15/03/31 14:37:49 INFO BlockManager: Removing broadcast 015/03/31 14:37:49 INFO BlockManager: Removing block broadcast_015/03/31 14:37:49 INFO MemoryStore: Block broadcast_0 of size 1264 dropped from memory (free 278018576)15/03/31 14:37:49 INFO BlockManager: Removing block broadcast_0_piece015/03/31 14:37:49 INFO MemoryStore: Block broadcast_0_piece0 of size 864 dropped from memory (free 278019440)15/03/31 14:37:49 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 192.168.1.5:57820 in memory (size: 864.0 B, free: 265.1 MB)15/03/31 14:37:49 INFO BlockManagerMaster: Updated info of block broadcast_0_piece015/03/31 14:37:49 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 192.168.1.5:57834 in memory (size: 864.0 B, free: 530.0 MB)15/03/31 14:37:49 INFO ContextCleaner: Cleaned broadcast 015/03/31 14:37:49 INFO ScalaEsRowRDD: Reading from [device/metric]15/03/31 14:37:49 INFO ScalaEsRowRDD: Discovered mapping {device=[mappings=[metric=[name=STRING, path=STRING, pathElements=[node=STRING, value=STRING], pathId=STRING, timestamp=DATE, value=DOUBLE]]]} for [device/metric]15/03/31 14:37:49 INFO SparkContext: Starting job: collect at SparkPlan.scala:8415/03/31 14:37:49 INFO DAGScheduler: Got job 1 (collect at SparkPlan.scala:84) with 1 output partitions (allowLocal=false)15/03/31 14:37:49 INFO DAGScheduler: Final stage: Stage 1(collect at SparkPlan.scala:84)15/03/31 14:37:49 INFO DAGScheduler: Parents of final stage: List()15/03/31 14:37:49 INFO DAGScheduler: Missing parents: List()15/03/31 14:37:49 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[6] at map at SparkPlan.scala:84), which has no missing parents15/03/31 14:37:49 INFO MemoryStore: ensureFreeSpace(4120) called with curMem=0, maxMem=27801944015/03/31 14:37:49 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.0 KB, free 265.1 MB)15/03/31 14:37:49 INFO MemoryStore: ensureFreeSpace(2403) called with curMem=4120, maxMem=27801944015/03/31 14:37:49 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.3 KB, free 265.1 MB)15/03/31 14:37:49 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.1.5:57820 (size: 2.3 KB, free: 265.1 MB)15/03/31 14:37:49 INFO BlockManagerMaster: Updated info of block broadcast_1_piece015/03/31 14:37:49 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:83815/03/31 14:37:49 INFO DAGScheduler: Submitting 1 missing tasks from Stage 1 (MappedRDD[6] at map at SparkPlan.scala:84)15/03/31 14:37:49 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks15/03/31 14:37:49 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, 192.168.1.5, NODE_LOCAL, 3731 bytes)15/03/31 14:37:50 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.1.5:57836 (size: 2.3 KB, free: 530.0 MB)15/03/31 14:37:52 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, 192.168.1.5): java.util.NoSuchElementException: key not found: node at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at scala.collection.MapLike$class.apply(MapLike.scala:141) at scala.collection.AbstractMap.apply(Map.scala:58) at org.elasticsearch.spark.sql.RowValueReader$class.addToBuffer(RowValueReader.scala:32) at org.elasticsearch.spark.sql.ScalaRowValueReader.addToBuffer(ScalaRowValueReader.scala:9) at org.elasticsearch.spark.sql.ScalaRowValueReader.addToMap(ScalaRowValueReader.scala:16) at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:596) at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:519) at org.elasticsearch.hadoop.serialization.ScrollReader.list(ScrollReader.java:560) at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:522) at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:596) at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:519) at org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:339) at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:290) at org.elasticsearch.ha
Re: java.util.NoSuchElementException: key not found:
aha ok, thanks. If I create different RDDs from a parent RDD and force evaluation thread-by-thread, then it should presumably be fine, correct? Or do I need to checkpoint the child RDDs as a precaution in case it needs to be removed from memory and recomputed? On Sat, Feb 28, 2015 at 4:28 AM, Shixiong Zhu wrote: > RDD is not thread-safe. You should not use it in multiple threads. > > Best Regards, > Shixiong Zhu > > 2015-02-27 23:14 GMT+08:00 rok : > >> I'm seeing this java.util.NoSuchElementException: key not found: exception >> pop up sometimes when I run operations on an RDD from multiple threads in >> a >> python application. It ends up shutting down the SparkContext so I'm >> assuming this is a bug -- from what I understand, I should be able to run >> operations on the same RDD from multiple threads or is this not >> recommended? >> >> I can't reproduce it all the time and I've tried eliminating caching >> wherever possible to see if that would have an effect, but it doesn't seem >> to. Each thread first splits the base RDD and then runs the >> LogisticRegressionWithSGD on the subset. >> >> Is there a workaround to this exception? >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/java-util-NoSuchElementException-key-not-found-tp21848.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >
Re: java.util.NoSuchElementException: key not found:
RDD is not thread-safe. You should not use it in multiple threads. Best Regards, Shixiong Zhu 2015-02-27 23:14 GMT+08:00 rok : > I'm seeing this java.util.NoSuchElementException: key not found: exception > pop up sometimes when I run operations on an RDD from multiple threads in a > python application. It ends up shutting down the SparkContext so I'm > assuming this is a bug -- from what I understand, I should be able to run > operations on the same RDD from multiple threads or is this not > recommended? > > I can't reproduce it all the time and I've tried eliminating caching > wherever possible to see if that would have an effect, but it doesn't seem > to. Each thread first splits the base RDD and then runs the > LogisticRegressionWithSGD on the subset. > > Is there a workaround to this exception? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/java-util-NoSuchElementException-key-not-found-tp21848.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
java.util.NoSuchElementException: key not found:
I'm seeing this java.util.NoSuchElementException: key not found: exception pop up sometimes when I run operations on an RDD from multiple threads in a python application. It ends up shutting down the SparkContext so I'm assuming this is a bug -- from what I understand, I should be able to run operations on the same RDD from multiple threads or is this not recommended? I can't reproduce it all the time and I've tried eliminating caching wherever possible to see if that would have an effect, but it doesn't seem to. Each thread first splits the base RDD and then runs the LogisticRegressionWithSGD on the subset. Is there a workaround to this exception? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-util-NoSuchElementException-key-not-found-tp21848.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
java.util.NoSuchElementException: key not found
Hi All, I suspect I am experiencing a bug. I've noticed that while running larger jobs, they occasionally die with the exception "java.util.NoSuchElementException: key not found xyz", where "xyz" denotes the ID of some particular task. I've excerpted the log from one job that died in this way below and attached the full log for reference. I suspect that my bug is the same as SPARK-2002 (linked below). Is there any reason to suspect otherwise? Is there any known workaround other than not coalescing? https://issues.apache.org/jira/browse/SPARK-2002 http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201406.mbox/%3CCAMwrk0=d1dww5fdbtpkefwokyozltosbbjqamsqqjowlzng...@mail.gmail.com%3E Note that I have been coalescing SchemaRDDs using "srdd = SchemaRDD(srdd._jschema_rdd.coalesce(partitions, False, None), sqlCtx)", the workaround described in this thread. http://mail-archives.apache.org/mod_mbox/spark-user/201409.mbox/%3ccanr-kkciei17m43-yz5z-pj00zwpw3ka_u7zhve2y7ejw1v...@mail.gmail.com%3E ... 14/09/15 21:43:14 INFO scheduler.TaskSetManager: Starting task 78.0 in stage 551.0 (TID 78738, bennett.research.intel-research.net, PROCESS_LOCAL, 1056 bytes) ... 14/09/15 21:43:15 INFO storage.BlockManagerInfo: Added taskresult_78738 in memory on bennett.research.intel-research.net:38074 (size: 13.0 MB, free: 1560.8 MB) ... 14/09/15 21:43:15 ERROR scheduler.TaskResultGetter: Exception while getting task result java.util.NoSuchElementException: key not found: 78738 at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at scala.collection.mutable.HashMap.apply(HashMap.scala:64) at org.apache.spark.scheduler.TaskSetManager.handleTaskGettingResult(TaskSetManager.scala:500) at org.apache.spark.scheduler.TaskSchedulerImpl.handleTaskGettingResult(TaskSchedulerImpl.scala:348) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:52) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:701) I am running the pre-compiled 1.1.0 binaries. best, -Brad - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org