Parquet

2018-07-19 Thread amin mohebbi
We do have two big tables each includes 5 billion of rows, so my question here is should we partition /sort the data and convert it to Parquet before doing any join? Best Regards ... Amin Mohebbi PhD candidate in Software Engineering   at

Unpivoting

2018-07-10 Thread amin mohebbi
Does anyone know how to transpose the columns in Spark -scala ?  This is how I want to unpivot the table  : How to unpivot the table based on the multiple columns | | | | | | | | | | | How to unpivot the table based on the multiple columns I am using Scala and Spark to unpivot a tabl

Interactive queries

2018-06-29 Thread amin mohebbi
... Amin Mohebbi PhD candidate in Software Engineering   at university of Malaysia   Tel : +60 18 2040 017 E-Mail : tp025...@ex.apiit.edu.my    amin_...@me.comd

submitting dependencies

2018-06-26 Thread amin mohebbi
-jars /home/sshuser/reactiveinflux-spark_2.10-1.4.0.10.0.5.1.jar sapn_2.11-1.0.jar Can you help to solve this issue?  Best Regards ....... Amin Mohebbi PhD candidate in Software Engineering   at university of Malaysia   Tel : +60 18 2040 017 E-Mail : tp02

Big data visualization

2018-05-27 Thread amin mohebbi
? files system/time series db/azure cosmos / standard db?2- Is it right way to do to use spark as to  etl and aggregation application , store it somewhere and use power bi for reporting and dashboard purposes?  Best Regards ... Amin Mohebbi PhD

Time series data

2018-05-23 Thread amin mohebbi
spark with nosql as I think combination of these two could help to have random access and run many queries by different users. 2- do we really need to use a time series db?  Best Regards ....... Amin Mohebbi PhD candidate in Software Engine

Mllib Error

2014-12-11 Thread amin mohebbi
ot;org.apache.spark" % "spark-core_2.10" % "1.1.1",But there is still an error that says :unresolved dependency spark-mllib;1.1.1 : not foundAnyone knows how to add dependency of Mllib in .sbt file? Best Regards ...

Mllib error

2014-12-09 Thread amin mohebbi
gards ... Amin Mohebbi PhD candidate in Software Engineering   at university of Malaysia   Tel : +60 18 2040 017 E-Mail : tp025...@ex.apiit.edu.my   amin_...@me.com

K-means clustering

2014-11-25 Thread amin mohebbi
that I do not want to use Mllib and would like to write my own k-means.  Best Regards ... Amin Mohebbi PhD candidate in Software Engineering   at university of Malaysia   Tel : +60 18 2040 017 E-Mail : tp025...@ex.apiit.edu.my

k-means clustering

2014-11-18 Thread amin mohebbi
ne explain to me how can I do the pre-processing step, before running the k-means using spark.   Best Regards ... Amin Mohebbi PhD candidate in Software Engineering   at university of Malaysia   Tel : +60 18 2040 017 E-Mail : tp025...@ex.apiit.edu.my   amin_...@me.com

Pyspark Error

2014-11-18 Thread amin mohebbi
no -5] No address associated with hostname >>> sc.parallelize(range(1000)).count() Traceback (most recent call last):   File "", line 1, in NameError: name 'sc' is not defined >>> sc Traceback (most recent call last):   File "", line 1, in NameE

canopy clustering

2014-11-10 Thread amin mohebbi
... Amin Mohebbi PhD candidate in Software Engineering at university of Malaysia Tel : +60 18 2040 017 E-Mail : tp025...@ex.apiit.edu.my amin_...@me.com

Kmeans

2014-07-16 Thread amin mohebbi
Can anyone explain to me what is difference between kmeans in Mlib and kmeans in examples/src/main/python/kmeans.py?   Best Regards ... Amin Mohebbi PhD candidate in Software Engineering   at university of Malaysia   H/P : +60 18 2040

How to host spark driver

2014-07-09 Thread amin mohebbi
-submit?   Best Regards ... Amin Mohebbi PhD candidate in Software Engineering   at university of Malaysia   H/P : +60 18 2040 017 E-Mail : tp025...@ex.apiit.edu.my   amin_...@me.com

Re: spark Driver

2014-07-08 Thread amin mohebbi
ckoverflow.com/questions/24571922/apache-spark-stderr-and-stdout/24594576#24594576 I am not sure whether  I need to set a ip address to driver ? do I need a separate machine for driver ? Best Regards ... Amin Mohebbi PhD candidate in So

Re: spark Driver

2014-07-08 Thread amin mohebbi
I have the following in spark-env.sh  SPARK_MASTER_IP=master SPARK_MASTER_port=7077   Best Regards ... Amin Mohebbi PhD candidate in Software Engineering   at university of Malaysia   H/P : +60 18 2040 017 E-Mail : tp025

spark Driver

2014-07-08 Thread amin mohebbi
rker@slave2:41483/user/Worker" "app-20140704174955-0002" 14/07/04 17:50:14 ERROR CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave2:33758] -> [akka.tcp://spark@master:54477] disassociated! Shutting down