Re: 'requirement failed: OneHotEncoderModel expected x categorical values for input column label, but the input column had metadata specifying n values.'

2019-11-05 Thread Mina Aslani
at 3:55 PM Mina Aslani wrote: > Hi, > > I am getting the following exception when I am > using OneHotEncoderEstimator with MultilayerPerceptronClassifier in > Pyspark. (using version 2.4.4) > > *'requirement failed: OneHotEncoderModel expected x categorical values for

'requirement failed: OneHotEncoderModel expected x categorical values for input column label, but the input column had metadata specifying n values.'

2019-11-05 Thread Mina Aslani
Hi, I am getting the following exception when I am using OneHotEncoderEstimator with MultilayerPerceptronClassifier in Pyspark. (using version 2.4.4) *'requirement failed: OneHotEncoderModel expected x categorical values for input column label, but the input column had metadata specifying n

Do we need to kill a spark job every time we change and deploy it?

2018-11-28 Thread Mina Aslani
Hi, I have a question for you. Do we need to kill a spark job every time we change and deploy it to cluster? Or, is there a way for Spark to automatically pick up the recent jar version? Best regards, Mina

Re: Time-Series Forecasting

2018-10-01 Thread Mina Aslani
t; questions and compare for yourself. >> >> >> Regards, >> Gourav Sengupta >> >> On Wed, Sep 19, 2018 at 5:02 PM Mina Aslani wrote: >> >>> Hi, >>> I have a question for you. Do we have any Time-Series Forecasting >>> library in Spark? >>> >>> Best regards, >>> Mina >>> >> >

Re: Time-Series Forecasting

2018-09-19 Thread Mina Aslani
ity do you need ? Ie which methods? > > > On 19. Sep 2018, at 18:01, Mina Aslani wrote: > > > > Hi, > > I have a question for you. Do we have any Time-Series Forecasting > library in Spark? > > > > Best regards, > > Mina >

Re: Time-Series Forecasting

2018-09-19 Thread Mina Aslani
Hi, I saw spark-ts <https://github.com/sryza/spark-timeseries>, however, looks like it's not under active development any more. I really appreciate to get your insight. Kindest regards, Mina On Wed, Sep 19, 2018 at 12:01 PM Mina Aslani wrote: > Hi, > I have a question for you. Do

Time-Series Forecasting

2018-09-19 Thread Mina Aslani
Hi, I have a question for you. Do we have any Time-Series Forecasting library in Spark? Best regards, Mina

How to get MultilayerPerceptronClassifier model parameters?

2018-08-10 Thread Mina Aslani
Hi, How can I get the parameters of my MultilayerPerceptronClassifier model? I only can get the layers parameter using myModel.layers. For other parameters, when I use myModel.getSeed()/myModel.getTol()/myModel. getMaxIter() I get below error: 'MultilayerPerceptronClassificationModel' object has

MultilayerPerceptronClassifier

2018-08-09 Thread Mina Aslani
Hi, I have couple of questions regarding using MultilayerPerceptronClassifier in pyspark. - Do we have any other options for solver other than solver='l-bfgs'? - Also, tried to tune using cross validation and find the best model based on the defined parameters. (e.g. maxIter, layers, tol, seed).

Semi-Supervised self-training (e.g. partial fitting)

2018-06-27 Thread Mina Aslani
Hi, Is partial fitting/self-training available for a classifier (e.g. Regression) in Apache Spark? Best regards, Mina

java.lang.IllegalArgumentException: requirement failed: BLAS.dot(x: Vector, y:Vector) was given Vectors with non-matching sizes

2018-05-15 Thread Mina Aslani
Hi, I am trying to test my spark app implemented in Java. In my spark app I load the logisticRegressionModel that I have already created, trained and tested using the portion of training data. Now, when I test my spark app with another set of data and try to predict, I get below error when

Re: How to use StringIndexer for multiple input /output columns in Spark Java

2018-05-15 Thread Mina Aslani
l in progress I think - should be available in 2.4.0 > > On Mon, 14 May 2018 at 22:32, Mina Aslani <aslanim...@gmail.com> wrote: > >> Please take a look at the api doc: https://spark.apache.org/ >> docs/2.3.0/api/java/org/apache/spark/ml/feature/StringIndexer.html >>

OneHotEncoderEstimator - java.lang.NoSuchMethodError: org.apache.spark.sql.Dataset.withColumns

2018-05-15 Thread Mina Aslani
Hi, I get below error when I try to run oneHotEncoderEstimator example. https://github.com/apache/spark/blob/b74366481cc87490adf4e69d26389ec737548c15/examples/src/main/java/org/apache/spark/examples/ml/JavaOneHotEncoderEstimatorExample.java#L67 Which is this line of the code:

Re: How to use StringIndexer for multiple input /output columns in Spark Java

2018-05-14 Thread Mina Aslani
Please take a look at the api doc: https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/ml/feature/StringIndexer.html On Mon, May 14, 2018 at 4:30 PM, Mina Aslani <aslanim...@gmail.com> wrote: > Hi, > > There is no SetInputCols/SetOutputCols for StringIndexer in Sp

How to use StringIndexer for multiple input /output columns in Spark Java

2018-05-14 Thread Mina Aslani
Hi, There is no SetInputCols/SetOutputCols for StringIndexer in Spark java. How multiple input/output columns can be specified then? Regards, Mina

java.lang.NullPointerException

2018-05-10 Thread Mina Aslani
Hi, I get java.lang.NullPointerException at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:128) When I try to createDataFrame using the sparkSession, see below: SparkConf conf = new SparkConf().setMaster().setAppName("test");

AWS credentials needed while trying to read a model from S3 in Spark

2018-05-09 Thread Mina Aslani
Hi, I am trying to load a ML model from AWS S3 in my spark app running in a docker container, however I need to pass the AWS credentials. My questions is, why do I need to pass the credentials in the path? And what is the workaround? Best regards, Mina

MappingException - org.apache.spark.mllib.classification.LogisticRegressionModel.load

2018-05-02 Thread Mina Aslani
Hi, I used pyspark to create a Logistic Regression model, train my training data and evaluate my test data using ML api. However, to use the model in my program, I saved the model(e.g. Logistic Regression model) and when I tried to load it in pyspark using sameModel =

Re: java.lang.UnsupportedOperationException: CSV data source does not support struct/ERROR RetryingBlockFetcher

2018-03-27 Thread Mina Aslani
e: > In case of storing as parquet file I don’t think it requires header. > option("header","true") > > Give a try by removing header option and then try to read it. I haven’t > tried. Just a thought. > > Thank you, > Naresh > > > On Tue, Mar 27,

java.lang.UnsupportedOperationException: CSV data source does not support struct/ERROR RetryingBlockFetcher

2018-03-27 Thread Mina Aslani
Hi, I am using pyspark. To transform my sample data and create model, I use stringIndexer and OneHotEncoder. However, when I try to write data as csv using below command df.coalesce(1).write.option("header","true").mode("overwrite").csv("output.csv") I get UnsupportedOperationException

Re: Serialize a DataFrame with Vector values into text/csv file

2018-02-20 Thread Mina Aslani
Hi, I was hoping that there is a casting vector into String method (instead of writing my UDF), so that it can then be serialized it into csv/text file. Best regards, Mina On Tue, Feb 20, 2018 at 6:52 PM, vermanurag wrote: > If your dataframe has columns types

Re: Serialize a DataFrame with Vector values into text/csv file

2018-02-20 Thread Mina Aslani
ight work then >> >> df.coalesce(1).write.option("header","true").mode("overwrite >> ").text("output") >> >> Regards, >> Snehasish >> >> On Wed, Feb 21, 2018 at 3:21 AM, Mina Aslani <aslanim...@gmail.com> >

Re: Serialize a DataFrame with Vector values into text/csv file

2018-02-20 Thread Mina Aslani
PM, SNEHASISH DUTTA <info.snehas...@gmail.com> wrote: > Hi Mina, > This might help > df.coalesce(1).write.option("header","true").mode("overwrite > ").csv("output") > > Regards, > Snehasish > > On Wed, Feb 21, 2018 at 1:5

Serialize a DataFrame with Vector values into text/csv file

2018-02-20 Thread Mina Aslani
Hi, I would like to serialize a dataframe with vector values into a text/csv in pyspark. Using below line, I can write the dataframe(e.g. df) as parquet, however I cannot open it in excel/as text. df.coalesce(1).write.option("header","true").mode(" overwrite").save("output") Best regards, Mina

Write a DataFrame with Vector values into text/csv file

2018-02-20 Thread Mina Aslani
Hi, I would like to write a dataframe with vactor values into a text/csv file. Using below line, I can write it as parquet, however I cannot open it in excel/as text. df.coalesce(1).write.option("header","true").mode("overwrite").save("stage-s3logs-model") Wondering how to save the result of a

org.apache.kafka.clients.consumer.OffsetOutOfRangeException

2018-02-12 Thread Mina Aslani
Hi, I am getting below error Caused by: org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: {topic1-0=304337} as soon as I submit a spark app to my cluster. I am using below dependency name:

Re: No space left on device

2017-10-17 Thread Mina Aslani
I have not tried rdd.unpersist(), I thought using rdd = null is the same, is it not? On Wed, Oct 18, 2017 at 1:07 AM, Imran Rajjad <raj...@gmail.com> wrote: > did you try calling rdd.unpersist() > > On Wed, Oct 18, 2017 at 10:04 AM, Mina Aslani <aslanim...@gmail.com> >

No space left on device

2017-10-17 Thread Mina Aslani
Hi, I get "No space left on device" error in my spark worker: Error writing stream to file /usr/spark-2.2.0/work/app-.../0/stderr java.io.IOException: No space left on device In my spark cluster, I have one worker and one master. My program consumes stream of data from kafka and publishes

SparkException: Invalid master URL

2017-07-10 Thread Mina Aslani
Hi I get below error when I try to run a job running in swarm-node. Can you please let me know what the problem is and how it can be fixed? Best regards, Mina util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Exception

Java Examples @ Spark github

2017-03-13 Thread Mina Aslani
Hi, When I go github and check the java examples @ https://github.com/apache/spark/tree/master/examples/src/main/java/org/apache/spark/examples, they do not look like to be updated with the latest spark (e.g. spark 2.11). Do you know by any chance where I can find the java examples for spark

Java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem

2017-03-13 Thread Mina Aslani
Hi, I get IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem on the specified below line: String master = "spark://:7077"; SparkConf sparkConf = new SparkConf()

Re: Failed to connect to master ...

2017-03-07 Thread Mina Aslani
Master and worker processes are running! On Wed, Mar 8, 2017 at 12:38 AM, ayan guha <guha.a...@gmail.com> wrote: > You need to start Master and worker processes before connecting to them. > > On Wed, Mar 8, 2017 at 3:33 PM, Mina Aslani <aslanim...@gmail.com> wrote: > &

Failed to connect to master ...

2017-03-07 Thread Mina Aslani
Hi, I am writing a spark Transformer in intelliJ in Java and trying to connect to the spark in a VM using setMaster. I get "Failed to connect to master ..." I get 17/03/07 16:20:55 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master VM_IPAddress:7077

Re: org.apache.spark.SparkException: Task not serializable

2017-03-06 Thread Mina Aslani
son being the > closures you have defined in the class need to be serialized and copied > over to all executor nodes. > > Hope this helps. > > Thanks > Ankur > > On Mon, Mar 6, 2017 at 1:06 PM, Mina Aslani <aslanim...@gmail.com> wrote: > >> Hi, >> >> I

org.apache.spark.SparkException: Task not serializable

2017-03-06 Thread Mina Aslani
Hi, I am trying to start with spark and get number of lines of a text file in my mac, however I get org.apache.spark.SparkException: Task not serializable error on JavaRDD logData = javaCtx.textFile(file); Please see below for the sample of code and the stackTrace. Any idea why this error is

Getting unrecoverable exception: java.lang.NullPointerException when trying to find wordcount in kafka topic

2017-02-26 Thread Mina Aslani
Hi, I am trying to submit a job to spark to count number of words in a specific kafka topic but I get below exception when I check the log: . failed with unrecoverable exception: java.lang.NullPointerException The command that I run follows: ./scripts/dm-spark-submit.sh --class

Apache Spark MLIB

2017-02-23 Thread Mina Aslani
Hi, I am going to start working on anomaly detection using Spark MLIB. Please note that I have not used Spark so far. I would like to read some data and if a user logged in from different ip address which is not common consider it as an anomaly, similar to what apple/google does. My preferred