Hi , I am new to predictionIO V 0.12.0 (elasticsearch - 5.2.1 , hbase - 1.2.6 , spark - 2.6.0) Hardware (244 GB RAM and Core - 32) . I have uploaded near about 1 million events(each containing 30k features) . while uploading I can see the size of hbase disk increasing and after all the events got uploaded the size of hbase disk is 567GB. In order to verify I ran the following commands
- pio-shell --with-spark --conf spark.network.timeout=10000000 --driver-memory 30G --executor-memory 21G --num-executors 7 --executor-cores 3 --conf spark.driver.maxResultSize=4g --conf spark.executor.heartbeatInterval=10000000 - import org.apache.predictionio.data.store.PEventStore - val eventsRDD = PEventStore.find(appName="test")(sc) - val c = eventsRDD.count() it shows event counts as 18944 After that from the script through which I uploaded the events, I randomly queried with there events Id and I was getting that event. I don't know how to make sure that all the events uploaded by me are there in the app. Any help is appreciated. On Fri, Nov 17, 2017 at 1:42 PM, Abhimanyu Nagrath < [email protected]> wrote: > Hi Pat , > > Upgraded to 0.12.0 > > Now I am getting the following error on hitting the above query > > java.net.URISyntaxException: Illegal character in path at index 39: > spark://<IP>:35432/classes/HBase Counters_en_US.class > at java.net.URI$Parser.fail(URI.java:2848) > at java.net.URI$Parser.checkChars(URI.java:3021) > at java.net.URI$Parser.parseHierarchical(URI.java:3105) > at java.net.URI$Parser.parse(URI.java:3053) > at java.net.URI.<init>(URI.java:588) > at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel( > NettyRpcEnv.scala:316) > at org.apache.spark.repl.ExecutorClassLoader.org$apache$spark$repl$ > ExecutorClassLoader$$getClassFileInputStreamFromSpa > rkRPC(ExecutorClassLoader.scala:90) > at org.apache.spark.repl.ExecutorClassLoader$$anonfun$ > 1.apply(ExecutorClassLoader.scala:57) > at org.apache.spark.repl.ExecutorClassLoader$$anonfun$ > 1.apply(ExecutorClassLoader.scala:57) > at org.apache.spark.repl.ExecutorClassLoader.findClassLocally( > ExecutorClassLoader.scala:162) > at org.apache.spark.repl.ExecutorClassLoader.findClass( > ExecutorClassLoader.scala:80) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.util.ResourceBundle$Control.newBundle(ResourceBundle.java:2640) > at java.util.ResourceBundle.loadBundle(ResourceBundle.java:1501) > at java.util.ResourceBundle.findBundle(ResourceBundle.java:1465) > at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1361) > at java.util.ResourceBundle.getBundle(ResourceBundle.java:1082) > at org.apache.hadoop.mapreduce.util.ResourceBundles. > getBundle(ResourceBundles.java:37) > at org.apache.hadoop.mapreduce.util.ResourceBundles.getValue( > ResourceBundles.java:56) > at org.apache.hadoop.mapreduce.util.ResourceBundles.getCounterGroupName( > ResourceBundles.java:77) > at org.apache.hadoop.mapreduce.counters.CounterGroupFactory. > newGroup(CounterGroupFactory.java:94) > at org.apache.hadoop.mapreduce.counters.AbstractCounters. > getGroup(AbstractCounters.java:226) > at org.apache.hadoop.mapreduce.counters.AbstractCounters. > findCounter(AbstractCounters.java:153) > at org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl$ > DummyReporter.getCounter(TaskAttemptContextImpl.java:110) > at org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl.getCounter( > TaskAttemptContextImpl.java:76) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.updateCounters( > TableRecordReaderImpl.java:285) > at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.updateCounters( > TableRecordReaderImpl.java:273) > at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue( > TableRecordReaderImpl.java:241) > at org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue( > TableRecordReader.java:138) > at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext( > NewHadoopRDD.scala:199) > at org.apache.spark.InterruptibleIterator.hasNext( > InterruptibleIterator.scala:39) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1760) > at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1158) > at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1158) > at org.apache.spark.SparkContext$$anonfun$runJob$5.apply( > SparkContext.scala:1951) > at org.apache.spark.SparkContext$$anonfun$runJob$5.apply( > SparkContext.scala:1951) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1149) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > On Tue, Nov 14, 2017 at 10:34 PM, Pat Ferrel <[email protected]> > wrote: > >> You should use pio 0.12.0 if you need Elasticsearch 5.x >> >> >> On Nov 14, 2017, at 6:39 AM, Abhimanyu Nagrath < >> [email protected]> wrote: >> >> Hi , I am new to predictionIo using version V0.11-incubating (spark - >> 2.6.1 , hbase - 1.2.6 , elasticsearch - 5.2.1) . Started the prediction >> server with ./pio-start-all and checked Pio status these are working fine. >> Then I created an app 'testApp' and imported some events into that >> predictionIO app, Now inorder to verify the count of imported events .I ran >> the following commands >> >> 1. pio-shell --with-spark >> 2. import org.apache.predictionio.data.store.PEventStore >> 3. val eventsRDD = PEventStore.find(appName="testApp")(sc) >> >> I got the error: >> >> ERROR Storage$: Error initializing storage client for source >> ELASTICSEARCH >> java.lang.ClassNotFoundException: elasticsearch.StorageClient >> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:264) >> at org.apache.predictionio.data.storage.Storage$.getClient(Stor >> age.scala:228) >> at org.apache.predictionio.data.storage.Storage$.org$apache$pre >> dictionio$data$storage$Storage$$updateS2CM(Storage.scala:254) >> at org.apache.predictionio.data.storage.Storage$$anonfun$source >> sToClientMeta$1.apply(Storage.scala:215) >> at org.apache.predictionio.data.storage.Storage$$anonfun$source >> sToClientMeta$1.apply(Storage.scala:215) >> at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLi >> ke.scala:189) >> at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91) >> at org.apache.predictionio.data.storage.Storage$.sourcesToClien >> tMeta(Storage.scala:215) >> at org.apache.predictionio.data.storage.Storage$.getDataObject( >> Storage.scala:284) >> at org.apache.predictionio.data.storage.Storage$.getDataObjectF >> romRepo(Storage.scala:269) >> at org.apache.predictionio.data.storage.Storage$.getMetaDataApp >> s(Storage.scala:387) >> at org.apache.predictionio.data.store.Common$.appsDb$lzycompute >> (Common.scala:27) >> at org.apache.predictionio.data.store.Common$.appsDb(Common.scala:27) >> at org.apache.predictionio.data.store.Common$.appNameToId(Commo >> n.scala:32) >> at org.apache.predictionio.data.store.PEventStore$.find(PEventS >> tore.scala:71) >> at $line19.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init> >> (<console>:28) >> at $line19.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<con >> sole>:33) >> at $line19.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:35) >> at $line19.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:37) >> at $line19.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:39) >> at $line19.$read$$iwC$$iwC$$iwC.<init>(<console>:41) >> at $line19.$read$$iwC$$iwC.<init>(<console>:43) >> at $line19.$read$$iwC.<init>(<console>:45) >> at $line19.$read.<init>(<console>:47) >> at $line19.$read$.<init>(<console>:51) >> at $line19.$read$.<clinit>(<console>) >> at $line19.$eval$.<init>(<console>:7) >> at $line19.$eval$.<clinit>(<console>) >> at $line19.$eval.$print(<console>) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce >> ssorImpl.java:62) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >> thodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:498) >> at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMa >> in.scala:1065) >> at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMa >> in.scala:1346) >> at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain. >> scala:840) >> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) >> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) >> at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoo >> p.scala:857) >> at org.apache.spark.repl.SparkILoop.interpretStartingWith(Spark >> ILoop.scala:902) >> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) >> at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop. >> scala:657) >> at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) >> at org.apache.spark.repl.SparkILoop.org >> <http://org.apache.spark.repl.sparkiloop.org/>$apache$spark$repl$ >> SparkILoop$$loop(SparkILoop.scala:670) >> at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$ >> repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) >> at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$ >> repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >> at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$ >> repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >> at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(S >> calaClassLoader.scala:135) >> at org.apache.spark.repl.SparkILoop.org >> <http://org.apache.spark.repl.sparkiloop.org/>$apache$spark$repl$ >> SparkILoop$$process(SparkILoop.scala:945) >> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) >> at org.apache.spark.repl.Main$.main(Main.scala:31) >> at org.apache.spark.repl.Main.main(Main.scala) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce >> ssorImpl.java:62) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >> thodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:498) >> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy >> $SparkSubmit$$runMain(SparkSubmit.scala:731) >> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit >> .scala:181) >> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) >> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >> org.apache.predictionio.data.storage.StorageClientException: Data >> source ELASTICSEARCH was not properly initialized. >> at org.apache.predictionio.data.storage.Storage$$anonfun$10.app >> ly(Storage.scala:285) >> at org.apache.predictionio.data.storage.Storage$$anonfun$10.app >> ly(Storage.scala:285) >> at scala.Option.getOrElse(Option.scala:120) >> at org.apache.predictionio.data.storage.Storage$.getDataObject( >> Storage.scala:284) >> at org.apache.predictionio.data.storage.Storage$.getDataObjectF >> romRepo(Storage.scala:269) >> at org.apache.predictionio.data.storage.Storage$.getMetaDataApp >> s(Storage.scala:387) >> at org.apache.predictionio.data.store.Common$.appsDb$lzycompute >> (Common.scala:27) >> at org.apache.predictionio.data.store.Common$.appsDb(Common.scala:27) >> at org.apache.predictionio.data.store.Common$.appNameToId(Commo >> n.scala:32) >> at org.apache.predictionio.data.store.PEventStore$.find(PEventS >> tore.scala:71) >> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28) >> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33) >> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:35) >> at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:37) >> at $iwC$$iwC$$iwC$$iwC.<init>(<console>:39) >> at $iwC$$iwC$$iwC.<init>(<console>:41) >> at $iwC$$iwC.<init>(<console>:43) >> at $iwC.<init>(<console>:45) >> at <init>(<console>:47) >> at .<init>(<console>:51) >> at .<clinit>(<console>) >> at .<init>(<console>:7) >> at .<clinit>(<console>) >> at $print(<console>) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce >> ssorImpl.java:62) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >> thodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:498) >> at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMa >> in.scala:1065) >> at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMa >> in.scala:1346) >> at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain. >> scala:840) >> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) >> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) >> at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoo >> p.scala:857) >> at org.apache.spark.repl.SparkILoop.interpretStartingWith(Spark >> ILoop.scala:902) >> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) >> at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop. >> scala:657) >> at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) >> at org.apache.spark.repl.SparkILoop.org >> <http://org.apache.spark.repl.sparkiloop.org/>$apache$spark$repl$ >> SparkILoop$$loop(SparkILoop.scala:670) >> at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$ >> repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) >> at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$ >> repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >> at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$ >> repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >> at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(S >> calaClassLoader.scala:135) >> at org.apache.spark.repl.SparkILoop.org >> <http://org.apache.spark.repl.sparkiloop.org/>$apache$spark$repl$ >> SparkILoop$$process(SparkILoop.scala:945) >> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) >> at org.apache.spark.repl.Main$.main(Main.scala:31) >> at org.apache.spark.repl.Main.main(Main.scala) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce >> ssorImpl.java:62) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >> thodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:498) >> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy >> $SparkSubmit$$runMain(SparkSubmit.scala:731) >> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit >> .scala:181) >> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) >> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >> >> >> I have verified through pio status and elasticsearch health commands that >> the elastic search server is running. Can someone please tell me how to >> resolve this issue. >> >> >> >> >> >> >
