Re: Please Help with DecisionTree/FeatureIndexer

2017-12-19 Thread Weichen Xu
Hi, Marco Do not call any single fit/transform by your self. You only need to call `pipeline.fit`/`pipelineModel.transform`. Like following: val assembler = new VectorAssembler(). setInputCols(inputData.columns.filter(_ != "Severity")). setOutputCol("features") val data =

Re: Spark error while trying to spark.read.json()

2017-12-19 Thread Michael Armbrust
- dev java.lang.AbstractMethodError almost always means that you have different libraries on the classpath than at compilation time. In this case I would check to make sure you have the correct version of Scala (and only have one version of scala) on the classpath. On Tue, Dec 19, 2017 at 5:42

Spark error while trying to spark.read.json()

2017-12-19 Thread satyajit vegesna
Hi All, Can anyone help me with below error, Exception in thread "main" java.lang.AbstractMethodError at scala.collection.TraversableLike$class.filterNot(TraversableLike.scala:278) at org.apache.spark.sql.types.StructType.filterNot(StructType.scala:98) at

Re: /tmp fills up to 100GB when using a window function

2017-12-19 Thread Vadim Semenov
Until after an action is done (i.e. save/count/reduce) or if you explicitly truncate the DAG by checkpointing. Spark needs to keep all shuffle files because if some task/stage/node fails it'll only need to recompute missing partitions by using already computed parts. On Tue, Dec 19, 2017 at

Re: What does Blockchain technology mean for Big Data? And how Hadoop/Spark will play role with it?

2017-12-19 Thread Ryan C. Kleck
IMO blockchain won’t be doing anything for big data anytime soon. It is not distributed (it’s “decntralized”). All blockchain data is replicated to ALL nodes in the network. At the moment there is a game involving breeding cats that is already clogging up ethereum blockchain. Blockchain might

Re: /tmp fills up to 100GB when using a window function

2017-12-19 Thread Mihai Iacob
When does spark remove them?   Regards,  Mihai IacobDSX Local - Security, IBM Analytics     -

Re: /tmp fills up to 100GB when using a window function

2017-12-19 Thread Vadim Semenov
Spark doesn't remove intermediate shuffle files if they're part of the same job. On Mon, Dec 18, 2017 at 3:10 PM, Mihai Iacob wrote: > This code generates files under /tmp...blockmgr... which do not get > cleaned up after the job finishes. > > Anything wrong with the code

Re: What does Blockchain technology mean for Big Data? And how Hadoop/Spark will play role with it?

2017-12-19 Thread Vadim Semenov
I think it means that we can replace HDFS with a blockchain-based FS, and then offload some processing to smart contracts. On Mon, Dec 18, 2017 at 11:59 PM, KhajaAsmath Mohammed < mdkhajaasm...@gmail.com> wrote: > I am looking for same answer too .. will wait for response from other > people > >

Re: NullPointerException while reading a column from the row

2017-12-19 Thread Vadim Semenov
getAs defined as: def getAs[T](i: Int): T = get(i).asInstanceOf[T] and when you do toString you call Object.toString which doesn't depend on the type, so asInstanceOf[T] get dropped by the compiler, i.e. row.getAs[Int](0).toString -> row.get(0).toString we can confirm that by writing a simple

NullPointerException while reading a column from the row

2017-12-19 Thread Anurag Sharma
The following Scala (Spark 1.6) code for reading a value from a Row fails with a NullPointerException when the value is null. val test = row.getAs[Int]("ColumnName").toString while this works fine val test1 = row.getAs[Int]("ColumnName") // returns 0 for nullval test2 = test1.toString //