Re: 'requirement failed: OneHotEncoderModel expected x categorical values for input column label, but the input column had metadata specifying n values.'

2019-11-05 Thread Mina Aslani
abel values are +-+ |label| +-+ | 0.0| | 1.0| +-+ How come it throws *java.lang.IllegalArgumentException*:*requirement failed: OneHotEncoderModel expected 2 categorical values for input column label, but the input column had metadata specifying 3 values.'* in MultilayerPerceptronClassifie

'requirement failed: OneHotEncoderModel expected x categorical values for input column label, but the input column had metadata specifying n values.'

2019-11-05 Thread Mina Aslani
ing n values.'* Using LogisticRegression, RandomForestClassifier or LinearRegression works fine for the same data and OneHotEncoderEstimator. Any insight on how to resolve this? Regards, Mina

Do we need to kill a spark job every time we change and deploy it?

2018-11-28 Thread Mina Aslani
Hi, I have a question for you. Do we need to kill a spark job every time we change and deploy it to cluster? Or, is there a way for Spark to automatically pick up the recent jar version? Best regards, Mina

Re: Time-Series Forecasting

2018-10-01 Thread Mina Aslani
Thank you very much, really appreciate the information. Kindest regards, Mina On Sat, Sep 29, 2018 at 9:42 PM Peyman Mohajerian wrote: > Here's a blog on Flint: > https://databricks.com/blog/2018/09/11/introducing-flint-a-time-series-library-for-apache-spark.html > I don

Re: Time-Series Forecasting

2018-09-19 Thread Mina Aslani
/seasonality) - AR Model/MA Model/Combined Model (e.g. ARMA, ARIMA) - ACF (Autocorrelation Function)/PACF (Partial Autocorrelation Function) - Recurrent Neural Network (LSTM: Long Short Term Memory) Kindest regards, Mina On Wed, Sep 19, 2018 at 12:55 PM Jörn Franke wrote: > What functional

Re: Time-Series Forecasting

2018-09-19 Thread Mina Aslani
Hi, I saw spark-ts <https://github.com/sryza/spark-timeseries>, however, looks like it's not under active development any more. I really appreciate to get your insight. Kindest regards, Mina On Wed, Sep 19, 2018 at 12:01 PM Mina Aslani wrote: > Hi, > I have a question for yo

Time-Series Forecasting

2018-09-19 Thread Mina Aslani
Hi, I have a question for you. Do we have any Time-Series Forecasting library in Spark? Best regards, Mina

How to get MultilayerPerceptronClassifier model parameters?

2018-08-10 Thread Mina Aslani
object has no attribute 'getSeed'/'getTol'/'getMaxIter'. Your insight is appreciated. Best regards, Mina

MultilayerPerceptronClassifier

2018-08-09 Thread Mina Aslani
te for the best model that I can get is the layers. Any idea? Best regards, Mina

Semi-Supervised self-training (e.g. partial fitting)

2018-06-27 Thread Mina Aslani
Hi, Is partial fitting/self-training available for a classifier (e.g. Regression) in Apache Spark? Best regards, Mina

java.lang.IllegalArgumentException: requirement failed: BLAS.dot(x: Vector, y:Vector) was given Vectors with non-matching sizes

2018-05-15 Thread Mina Aslani
was given Vectors with non-matching sizes I know the cause as the new test data does not have the same vector size as the trained model. However, I would like to know how I can resolve it? What is the suggestion/workaround? I really appreciate your quick response. Best regards, Mina

Re: How to use StringIndexer for multiple input /output columns in Spark Java

2018-05-15 Thread Mina Aslani
: OneHotEncoderEstimator - java.lang.NoSuchMethodError: org.apache.spark.sql .Dataset.withColumns Regards, Mina On Tue, May 15, 2018 at 2:37 AM, Nick Pentreath wrote: > Multi column support for StringIndexer didn’t make it into Spark 2.3.0 > > The PR is still in progress I think - should be

OneHotEncoderEstimator - java.lang.NoSuchMethodError: org.apache.spark.sql.Dataset.withColumns

2018-05-15 Thread Mina Aslani
t working. Also, oneHotEncoder is deprecated. I really appreciate your quick response. Regards, Mina

Re: How to use StringIndexer for multiple input /output columns in Spark Java

2018-05-14 Thread Mina Aslani
Please take a look at the api doc: https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/ml/feature/StringIndexer.html On Mon, May 14, 2018 at 4:30 PM, Mina Aslani wrote: > Hi, > > There is no SetInputCols/SetOutputCols for StringIndexer in Spark java. > How multiple

How to use StringIndexer for multiple input /output columns in Spark Java

2018-05-14 Thread Mina Aslani
Hi, There is no SetInputCols/SetOutputCols for StringIndexer in Spark java. How multiple input/output columns can be specified then? Regards, Mina

java.lang.NullPointerException

2018-05-10 Thread Mina Aslani
spark.createDataFrame(data, schema); Any idea? I am using JAVA therefore,I cannot convert my RDD to DF(). T Therefore, I try to get specific field/values and createDataFrame manually. Regards, Mina

AWS credentials needed while trying to read a model from S3 in Spark

2018-05-09 Thread Mina Aslani
Hi, I am trying to load a ML model from AWS S3 in my spark app running in a docker container, however I need to pass the AWS credentials. My questions is, why do I need to pass the credentials in the path? And what is the workaround? Best regards, Mina

MappingException - org.apache.spark.mllib.classification.LogisticRegressionModel.load

2018-05-02 Thread Mina Aslani
? Your input is appreciated. Best regards, Mina

Re: java.lang.UnsupportedOperationException: CSV data source does not support struct/ERROR RetryingBlockFetcher

2018-03-27 Thread Mina Aslani
ot save df as csv as it throws. ava.lang.UnsupportedOperationException: CSV data source does not support struct,values:array> data type. Any idea? Best regards, Mina On Tue, Mar 27, 2018 at 10:51 PM, naresh Goud wrote: > In case of storing as parquet file I don’t think it requires header

java.lang.UnsupportedOperationException: CSV data source does not support struct/ERROR RetryingBlockFetcher

2018-03-27 Thread Mina Aslani
.save("output") The above command saves data but it's in parquet format. How can I read parquet file and convert to csv to observe the data? When I use df = spark.read.parquet("1.parquet"), it throws: ERROR RetryingBlockFetcher: Exception while beginning fetch of 1 outstanding blocks Your input is appreciated. Best regards, Mina

Re: Serialize a DataFrame with Vector values into text/csv file

2018-02-20 Thread Mina Aslani
Hi, I was hoping that there is a casting vector into String method (instead of writing my UDF), so that it can then be serialized it into csv/text file. Best regards, Mina On Tue, Feb 20, 2018 at 6:52 PM, vermanurag wrote: > If your dataframe has columns types like vector then you cannot s

Re: Serialize a DataFrame with Vector values into text/csv file

2018-02-20 Thread Mina Aslani
Hi Snehasish, Unfortunately, none of the solutions worked. Regards, Mina On Tue, Feb 20, 2018 at 5:12 PM, SNEHASISH DUTTA wrote: > Hi Mina, > > Even text won't work you may try this df.coalesce(1).write.option("h > eader","true").mode("overwrite"

Re: Serialize a DataFrame with Vector values into text/csv file

2018-02-20 Thread Mina Aslani
Hi Snehasish, Using df.coalesce(1).write.option("header","true").mode("overwrite ").csv("output") throws java.lang.UnsupportedOperationException: CSV data source does not support struct<...> data type. Regards, Mina On Tue, Feb 20, 2018

Serialize a DataFrame with Vector values into text/csv file

2018-02-20 Thread Mina Aslani
Hi, I would like to serialize a dataframe with vector values into a text/csv in pyspark. Using below line, I can write the dataframe(e.g. df) as parquet, however I cannot open it in excel/as text. df.coalesce(1).write.option("header","true").mode(" overwrite").save("output") Best regards, Mina

Write a DataFrame with Vector values into text/csv file

2018-02-20 Thread Mina Aslani
) Wondering how to save the result of a MLib transformation function(e.g. oneHotEncoder) which generates vectors into a file. Best regards, Mina

org.apache.kafka.clients.consumer.OffsetOutOfRangeException

2018-02-12 Thread Mina Aslani
.g. default=none). Wondering what is the cause and how to fix. Best regards, Mina

Re: No space left on device

2017-10-17 Thread Mina Aslani
I have not tried rdd.unpersist(), I thought using rdd = null is the same, is it not? On Wed, Oct 18, 2017 at 1:07 AM, Imran Rajjad wrote: > did you try calling rdd.unpersist() > > On Wed, Oct 18, 2017 at 10:04 AM, Mina Aslani > wrote: > >> Hi, >> >> I get &quo

No space left on device

2017-10-17 Thread Mina Aslani
publishes the result into kafka. I set my RDD = null after I finish working, so that intermediate shuffle files are removed quickly. How can I avoid "No space left on device"? Best regards, Mina

SparkException: Invalid master URL

2017-07-10 Thread Mina Aslani
Hi I get below error when I try to run a job running in swarm-node. Can you please let me know what the problem is and how it can be fixed? Best regards, Mina util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Exception

Java Examples @ Spark github

2017-03-13 Thread Mina Aslani
2.11/book/github/etc? Regards, Mina

Java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem

2017-03-13 Thread Mina Aslani
ot running locally. I tried using master="local[1]" same problem. Any idea? Regards, Mina

Re: Failed to connect to master ...

2017-03-07 Thread Mina Aslani
Master and worker processes are running! On Wed, Mar 8, 2017 at 12:38 AM, ayan guha wrote: > You need to start Master and worker processes before connecting to them. > > On Wed, Mar 8, 2017 at 3:33 PM, Mina Aslani wrote: > >> Hi, >> >> I am writing a spark Trans

Failed to connect to master ...

2017-03-07 Thread Mina Aslani
- Why using "local[1]" no exception is thrown and how to setup to read from kafka in VM? *- How to stream from Kafka (data in the topic is in json format)? * Your input is appreciated! Best regards, Mina

Re: org.apache.spark.SparkException: Task not serializable

2017-03-06 Thread Mina Aslani
Thank you Ankur for the quick response, really appreciate it! Making the class serializable resolved the exception! Best regards, Mina On Mon, Mar 6, 2017 at 4:20 PM, Ankur Srivastava wrote: > The fix for this make your class Serializable. The reason being the > closures you have defi

org.apache.spark.SparkException: Task not serializable

2017-03-06 Thread Mina Aslani
thrown? Best regards, Mina System.out.println("Creating Spark Configuration"); SparkConf javaConf = new SparkConf(); javaConf.setAppName("My First Spark Java Application"); javaConf.setMaster("PATH to my spark"); System.out.println("Creating Spark Contex

Getting unrecoverable exception: java.lang.NullPointerException when trying to find wordcount in kafka topic

2017-02-26 Thread Mina Aslani
streaming as well, same error exists! Any idea about cause of the error? Kindest regards, Mina

Apache Spark MLIB

2017-02-23 Thread Mina Aslani
thoughts/experience/insight with me. Best regards, Mina

Re: indexedrdd and radix tree: how to search indexedRDD using all prefixes?

2015-11-24 Thread Mina
This is what a Radix tree returns -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/indexedrdd-and-radix-tree-how-to-search-indexedRDD-using-all-prefixes-tp25459p25460.html Sent from the Apache Spark User List mailing list archive at Nabble.com. -

indexedrdd and radix tree: how to search indexedRDD using all prefixes?

2015-11-24 Thread Mina
from an indexedRDD. Thank you, Mina -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/indexedrdd-and-radix-tree-how-to-search-indexedRDD-using-all-prefixes-tp25459.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Berkeley DB storage for Spark

2015-10-27 Thread Mina
I would like to store my data in a Berkeley DB in Hadoop and run Spark for data processing. Is it possible? Thanks Mina -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Berkeley-DB-storage-for-Spark-tp25215.html Sent from the Apache Spark User List mailing

Re: Using reference for RDD is safe?

2015-07-20 Thread Mina
Hi, thank you for your answer. but i was talking about function reference. I want to transform an RDD using a function consisting of multiple transforms. For example def transformFunc1(rdd: RDD[Int]): RDD[Int] = { } val rdd2 = transformFunc1(rdd1)... here i am using reference, i think but i am