How to read multiple libsvm files in Spark?

2018-09-20 Thread Md. Rezaul Karim
I'm experiencing "Exception in thread "main" java.io.IOException: Multiple input paths are not supported for libsvm data" exception while trying to read multiple libsvm files using Spark 2.3.0: val URLs = spark.read.format("libsvm").load("url_svmlight.tar/url_svmlight/*.svm") Any other

Re: Writing a DataFrame is taking too long and huge space

2018-03-09 Thread Md. Rezaul Karim
pre-processing. By the way, I tried using Spark builtin CSV library too. Best, Md. Rezaul Karim, BSc, MSc Research Scientist, Fraunhofer FIT, Germany Ph.D. Researcher, Information Systems, RWTH Aachen University, Germany eMail: rezaul.ka...@fit.fraunhofer.de <andrea.be

Writing a DataFrame is taking too long and huge space

2018-03-09 Thread Md. Rezaul Karim
esce(1).write.format("com.databricks.spark.csv").save("data/file.csv") Any better suggestion? Md. Rezaul Karim, BSc, MSc Research Scientist, Fraunhofer FIT, Germany Ph.D. Researcher, Information Systems, RWTH Aachen University, Germany eMail: rezaul.ka...@fit.fraunhofer.de <andr

Reinforcement Learning with Spark

2018-01-05 Thread Md. Rezaul Karim
Hi All, Is there any Reinforcement Learning algorithm implemented with Spark -i.e. any link to GitHub/open source project etc.? Best, Md. Rezaul Karim, BSc, MSc Research Scientist, Fraunhofer FIT, Germany Ph.D. Researcher, Information Systems, RWTH Aachen University, Germany eMail

SpecificColumnarIterator has grown past JVM limit of 0xFFF

2017-11-17 Thread Md. Rezaul Karim
of 0xFFF* I understand that the current implementation cannot handle so many columns. However, I was still wondering if there's any workaround to handle a dataset like this? Kind regards, _ *Md. Rezaul Karim*, BSc, MSc Research Scientist, Fraunhofer FIT, Germany PhD

Re: StringIndexer on several columns in a DataFrame with Scala

2017-10-30 Thread Md. Rezaul Karim
Hi Nick, Both approaches worked and I realized my silly mistake too. Thank you so much. @Xu, thanks for the update. Best, Regards, _ *Md. Rezaul Karim*, BSc, MSc Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA

StringIndexer on several columns in a DataFrame with Scala

2017-10-27 Thread Md. Rezaul Karim
riencing NullPointerException at for (colName <- featureCol) I am sure, I am doing something wrong. Any suggestion? Regards, _____________ *Md. Rezaul Karim*, BSc, MSc Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, D

WARN: Truncated the string representation with df.describe()

2017-10-16 Thread Md. Rezaul Karim
Hi, When I try to see the statistics in a DataFrame using the df.describe() method, I am experiencing the following WARN and as a result, nothing is getting printed: 17/10/16 18:37:54 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted

Bayesian network with Saprk

2017-09-11 Thread Md. Rezaul Karim
Hi All, I am planning to use a Bayesian network to integrate and infer the links between miRNA and proteins based on their expression. Is there any implementation in Spark for the Bayesian network so that I can adapt to feed my data? Regards, _ *Md. Rezaul

Re: [Spark ML] LogisticRegressionWithSGD

2017-06-29 Thread Md. Rezaul Karim
+1 On Jun 29, 2017 10:46 PM, "Kevin Quinn" wrote: > Hello, > > I'd like to build a system that leverages semi-online updates and I wanted > to use stochastic gradient descent. However, after looking at the > documentation it looks like that method is deprecated. Is there

RE: IDE for python

2017-06-28 Thread Md. Rezaul Karim
By the way, Pycharm from JetBrians also have a community edition which is free and open source. Moreover, if you are a student, you can use the professional edition for students as well. For more, see here https://www.jetbrains.com/student/ On Jun 28, 2017 11:18 AM, "Sotola, Radim"

Re: Could you please add a book info on Spark website?

2017-06-25 Thread Md. Rezaul Karim
Thanks, Sean. I will ask them to do so. Regards, _ *Md. Rezaul Karim*, BSc, MSc, PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html

Could you please add a book info on Spark website?

2017-06-25 Thread Md. Rezaul Karim
Hi Sean, Last time, you helped me add a book info (in the books section) on this page https://spark.apache.org/documentation.html. Could you please add another book info. Here's necessary information about the book: *Title*: Scala and Spark for Big Data Analytics *Authors*: Md. Rezaul Karim

Re: How to convert Spark MLlib vector to ML Vector?

2017-04-10 Thread Md. Rezaul Karim
Hi Yan, Ryan, and Nick, Actually, for a special use case, I had to use RDD-based Spark MLlib which did not work eventually. Therefore, I had to switch to Spark ML later on. Thanks for your support guys. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher

How to convert Spark MLlib vector to ML Vector?

2017-04-09 Thread Md. Rezaul Karim
setInputCol("features") .setOutputCol("pcaFeatures") .setK(100) .fit(trainingDF) /// GETTING EXCEPTION HERE Please, someone, help me to solve the problem. Kind regards, *Md. Rezaul Karim*

Research paper used in GraphX

2017-03-31 Thread Md. Rezaul Karim
Hi All, Could anyone please tell me which research paper(s) was/were used to implement the metrics like strongly connected components, page rank, triangle count, closeness centrality, clustering coefficient etc. in Spark GrpahX? Regards, _ *Md. Rezaul Karim

Re: Question on Spark's graph libraries

2017-03-10 Thread Md. Rezaul Karim
+1 Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <http://139.59.184.114/index.html>

Re: Debugging Spark application

2017-02-16 Thread Md. Rezaul Karim
pts to eclipse *I > think* > > > Regards > Sam > > > On Thu, 16 Feb 2017 at 22:00, Md. Rezaul Karim < > rezaul.ka...@insight-centre.org> wrote: > >> Hi, >> >> I was looking for some URLs/documents for getting started on debugging >> Spark applicatio

Debugging Spark application

2017-02-16 Thread Md. Rezaul Karim
Hi, I was looking for some URLs/documents for getting started on debugging Spark applications. I prefer developing Spark applications with Scala on Eclipse and then package the application jar before submitting. Kind regards, Reza

Re: EC2 script is missing in Spark 2.0.0~2.1.0

2017-02-11 Thread Md. Rezaul Karim
Thanks for the great help. Appreciated! Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <h

Re: EC2 script is missing in Spark 2.0.0~2.1.0

2017-02-11 Thread Md. Rezaul Karim
Hi Takeshi, Now I understand that spark-ec2 script was moved to AMPLab. How could I use that one i.e. new location/URL, please? Alternatively, can I use the same script provided with prior Spark releases? Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher

EC2 script is missing in Spark 2.0.0~2.1.0

2017-02-11 Thread Md. Rezaul Karim
, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <http://139.59.184.114/index.html>

Re: How to specify "verbose GC" in Spark submit?

2017-02-06 Thread Md. Rezaul Karim
Thanks, Bryan. Got your point. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <h

How to specify "verbose GC" in Spark submit?

2017-02-06 Thread Md. Rezaul Karim
Dear All, Is there any way to specify verbose GC -i.e. “-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps” in Spark submit? Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA

Re: DAG Visualization option is missing on Spark Web UI

2017-01-30 Thread Md. Rezaul Karim
Hi Mark, That worked for me! Thanks a million. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html

Pruning decision tree in Spark

2017-01-30 Thread Md. Rezaul Karim
, _ *Md. Rezaul Karim* BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <http://139.59.184.114/index.html>

Re: DAG Visualization option is missing on Spark Web UI

2017-01-28 Thread Md. Rezaul Karim
that I am experiencing the same issue with Spark 2.x (i.e. 2.0.0, 2.0.1, 2.0.2 and 2.1.0). Refer the attached screenshot of the UI that I am seeing on my machine: [image: Inline images 1] Please suggest. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher

DAG Visualization option is missing on Spark Web UI

2017-01-28 Thread Md. Rezaul Karim
Hi All, I am running a Spark job on my local machine written in Scala with Spark 2.1.0. However, I am not seeing any option of "*DAG Visualization*" at http://localhost:4040/jobs/ Suggestion, please. Regards, _____ *Md. Rezaul Karim*, BSc, MSc PhD

Re: Text

2017-01-27 Thread Md. Rezaul Karim
Some operations like map, filter, flatMap and coalesce (with shuffle=false) usually preserve the order. However, sortBy, reduceBy, partitionBy, join etc. do not. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National

Re: How to tune number of tesks

2017-01-26 Thread Md. Rezaul Karim
argument as TRUE. Val yourRDD = yourRDD.coalesce(1).saveAsTextFile("data/output") Hope that helps. Regards, _____ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dang

How to reduce number of tasks and partitions in Spark job?

2017-01-26 Thread Md. Rezaul Karim
? Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <http://139.59.184.114/index.html>

Re: "Unable to load native-hadoop library for your platform" while running Spark jobs

2017-01-19 Thread Md. Rezaul Karim
Thanks, Sean. I will explore online more. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <h

"Unable to load native-hadoop library for your platform" while running Spark jobs

2017-01-19 Thread Md. Rezaul Karim
.path=$HADOOP_HOME/lib/native" Although my Spark job executes successfully and writes the results to a file at the end. However, I am not getting any logs to track the progress. Could someone help me to solve this problem? Regards, _____ *Md. Rezaul Karim*, BS

Parsing RDF data with Spark

2017-01-18 Thread Md. Rezaul Karim
Hi All, Is there any way to parse Linked Data in RDF(.n3,. ttl, .nq,. nt) format with Spark? Kind regards, Reza

Re: Old version of Spark [v1.2.0]

2017-01-15 Thread Md. Rezaul Karim
Hi Ayan, Thanks a million. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <h

Old version of Spark [v1.2.0]

2017-01-15 Thread Md. Rezaul Karim
Hi, I am looking for Spark 1.2.0 version. I tried to download in the Spark website but it's no longer available. Any suggestion? Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway

H2O DataFrame to Spark RDD/DataFrame

2017-01-12 Thread Md. Rezaul Karim
docs/booklets/SparklingWaterVignette.pdf> However, it discusses how to convert a Spark RDD or DaataFrame to H2O DatFrame but not the vice-versa. Regards, _____________ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Busin

Re: How to save spark-ML model in Java?

2017-01-12 Thread Md. Rezaul Karim
rwrite().save("output/NBModel") Hope that helps. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/

Re: Machine Learning in Spark 1.6 vs Spark 2.0

2017-01-09 Thread Md. Rezaul Karim
Hi, Currently, I have been using Spark 2.1.0 for ML and so far did not experience any critical issue. It's much stable compared to Spark 2.0.1/2.0.2 I would say. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National

Re: Machine Learning in Spark 1.6 vs Spark 2.0

2017-01-09 Thread Md. Rezaul Karim
g, etc. These features will help make your machine learning scalable and easy too. Regards, _____ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://ww

Re: Issue with SparkR setup on RStudio

2017-01-04 Thread Md. Rezaul Karim
Cheung, The problem has been solved after switching from Windows to Linux environment. Thanks. Regards, _ *Md. Rezaul Karim* BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway

RBackendHandler Error while running ML algorithms with SparkR on RStudio

2017-01-03 Thread Md. Rezaul Karim
icDF nbModel <- spark.naiveBayes(nbDF, Survived ~ Class + Sex + Age) # Model summary summary(nbModel) # Prediction nbPredictions <- predict(nbModel, nbTestDF) showDF(nbPredictions) Someone please help me to get rid of this error. Regards, _ *Md. Rezaul Karim

Re: Issue with SparkR setup on RStudio

2017-01-02 Thread Md. Rezaul Karim
Hello Cheung, Happy New Year! No, I did not configure Hive on my machine. Even I have tried not setting the HADOOP_HOME but getting the same error. Regards, _ *Md. Rezaul Karim* BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University

Issue with SparkR setup on RStudio

2016-12-29 Thread Md. Rezaul Karim
at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.Traversabl Any kind of help would be appreciated. Regards, _ *Md. Rezaul Karim* BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics Nation

Re: Running spark from Eclipse and then Jar

2016-12-10 Thread Md. Rezaul Karim
quot;*db.lck*" file which was preventing the jar to be executed from the command line. I just deleted that file, packaged my project as jar again and finally the problem resolved. Regards, _________ *Md. Rezaul Karim* BSc, MSc PhD Researcher, INSIGHT Centre for Data

Re: Random Forest hangs without trace of error

2016-12-09 Thread Md. Rezaul Karim
I had similar experience last week. Even I could not find any error trace. Later on, I did the following to get rid of the problem: i) I downgraded to Spark 2.0.0 ii) Decreased the value of maxBins and maxDepth Additionally, make sure that you set the featureSubsetStrategy as "auto" to let the

"Failed to find data source: libsvm" while running Spark application with jar

2016-12-08 Thread Md. Rezaul Karim
the input file. Any kind of help is appreciated. Regards, _____ *Md. Rezaul Karim* BSc, MSc Ph.D. Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.

Re: Running spark from Eclipse and then Jar

2016-12-07 Thread Md. Rezaul Karim
ed to find data source: libsvm. * The application works fine on Eclipse. However, while packaging the corresponding jar file, I am getting the above error which is really weird! Regards, _____ *Md. Rezaul Karim* BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytic

Re: Running spark from Eclipse and then Jar

2016-12-07 Thread Md. Rezaul Karim
single An example pom.xml file has been attached for your reference. Feel free to reuse it. Regards, _ *Md. Rezaul Karim,* BSc

Pruning decision tree to create an optimal tree

2016-12-07 Thread Md. Rezaul Karim
, _ *Md. Rezaul Karim* BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <http://139.59.184.114/index.html>

Re: How to compute the recall and F1-score in Linear Regression based model

2016-12-06 Thread Md. Rezaul Karim
the similar metrics using the Linear Regression based model for multiclass or binary class dataset. Regards, _________ *Md. Rezaul Karim* BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Irela

How to compute the recall and F1-score in Linear Regression based model

2016-12-06 Thread Md. Rezaul Karim
List()) { count++; } System.out.println("precision: " + (double) (count * 100) / predictions.count()); Now, I would like to compute other evaluation metrics like *Recall *and *F1-score *etc. How could I do that? Regards, _____ *M

Multilabel classification with Spark MLlib

2016-11-29 Thread Md. Rezaul Karim
appreciated. Regards, _ *Md. Rezaul Karim,* BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <http://139.59.184.114/index.html>

Multilabel classification with Spark MLlib

2016-11-25 Thread Md. Rezaul Karim
appreciated. Regards, _ *Md. Rezaul Karim,* BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <http://139.59.184.114/index.html>