from:"Mohamed Nadjib MAMI"

Filling Parquet files by values in Value of a JavaPairRDD

2015-06-06 Thread Mohamed Nadjib Mami

Hello Sparkers, I'm reading data from a CSV file, applying some transformations and ending up with an RDD of pairs (String,Iterable<>). I have already prepared Parquet files. I want now to take the previous (key,value) RDD and populate the parquet files like follows: - key holds the name of the

Parition RDD by key to create DataFrames

2016-03-15 Thread Mohamed Nadjib MAMI

Hi, I have a pair RDD of the form: (mykey, (value1, value2)) How can I create a DataFrame having the schema [V1 String, V2 String] to store [value1, value2] and save it into a Parquet table named "mykey"? /createDataFrame()/ method takes an RDD and a schema (StructType) in parameters. The sc

Spark SQL - java.lang.StackOverflowError after caching table

2016-03-24 Thread Mohamed Nadjib MAMI

Hi all, I'm running SQL queries (sqlContext.sql()) on Parquet tables and facing a problem with table caching (sqlContext.cacheTable()), using spark-shell of Spark 1.5.1. After I run the sqlContext.cacheTable(table), the sqlContext.sql(query) takes longer the first time (well, for the lazy exe

Re: Spark SQL - java.lang.StackOverflowError after caching table

2016-03-24 Thread Mohamed Nadjib MAMI

? If you can show snippet of your code, that would help give us more clue. Thanks On Mar 24, 2016, at 2:43 AM, Mohamed Nadjib MAMI wrote: Hi all, I'm running SQL queries (sqlContext.sql()) on Parquet tables and facing a problem with table caching (sqlContext.cacheTable()), using spark-she

Re: Spark SQL - java.lang.StackOverflowError after caching table

2016-03-24 Thread Mohamed Nadjib MAMI

6.id=p.id ORDER BY p.`bbb` LIMIT 10" On 24.03.2016 22:16, Ted Yu wrote: Can you obtain output from explain(true) on the query after cacheTable() call ? Potentially related JIRA: [SPARK-13657] [SQL] Support parsing very long AND/OR expressions On Thu, Mar 24, 2016 at 12:55 PM, Mo

SQL query results: what is being cached?

2016-04-06 Thread Mohamed Nadjib MAMI

I noticed that in most SQL queries (sqlContext.sql(query)) I ran on Parquet tables that some results are returned faster after the first and second run of the query. Is this variation normal i.e. two executions of the same job can take different times? or there is some intermediate results bein

Too many open files, why changing ulimit not effecting?

2016-02-05 Thread Mohamed Nadjib MAMI

Hello all, I'm getting the famous /java.io.FileNotFoundException: ... (Too many open files) /exception. What seemed to have helped people out, it haven't for me. I tried to set the ulimit via the command line /"ulimit -n"/, then I tried to add the following lines to /"/etc/security/limits.con

ErrorToken illegal character in a query having / @ $ . symbols

2016-02-08 Thread Mohamed Nadjib MAMI

Hello all, Could someone please help me figure out what wrong with my query that I'm running over Parquet tables? the query has the following form: weird_query = "SELECT a._example.com/aa/1.1/aa_, b._example.com/bb/1.2/bb_ FROM www$aa@aa a LEFT JOIN www$bb@bb b ON a.http://example.de/cc=b.co

Re: Spark on YARN: java.lang.ClassCastException SerializedLambda to org.apache.spark.api.java.function.Function in instance of org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1

2015-12-06 Thread Mohamed Nadjib Mami

Your jars are not delivered to the workers. Have a look at this: http://stackoverflow.com/questions/24052899/how-to-make-it-easier-to-deploy-my-jar-to-spark-cluster-in-standalone-mode -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-YARN-java-lang-Cl

Java.util.ArrayList is not a valid external type for schema of array

2016-10-13 Thread Mohamed Nadjib MAMI

in Parquet tables. Any help on solving/working around this would be very appreciated. *Regards, Grüße, **Cordialement,** Recuerdos, Saluti, προσρήσεις, 问候, تحياتي.* *Mohamed Nadjib Mami* *PhD Student - EIS Department - **Bonn University (Germany).* *About me! <http://www.strikingly.com/mohame

SparkSQL: intra-SparkSQL-application table registration

2016-11-14 Thread Mohamed Nadjib Mami

Hello, I've asked the following question [1] on Stackoverflow but didn't get an answer, yet. I use now this channel to give it more visibility, and hopefully find someone who can help. "*Context.* I have tens of SQL queries stored in separate files. For benchmarking purposes, I created an ap

df.count() returns one more count than SELECT COUNT()

2017-04-06 Thread Mohamed Nadjib Mami

I paste this right from Spark shell (Spark 2.1.0): /scala> spark.sql("SELECT count(distinct col) FROM Table").show()// //+-+ // //|count(DISTINCT col)|// //+-+// //|4697|// //+-+// //scala> spark.sql

Re: df.count() returns one more count than SELECT COUNT()

2017-04-06 Thread Mohamed Nadjib MAMI

That was the case. Thanks for the quick and clean answer, Hemanth. *Regards, Grüße, **Cordialement,** Recuerdos, Saluti, προσρήσεις, 问候, تحياتي.* *Mohamed Nadjib Mami* *Research Associate @ Fraunhofer IAIS - PhD Student @ Bonn University* *About me! <http://www.strikingly.com/mohamed-nadjib-m

Help explaining explain() after DataFrame join reordering

2018-06-01 Thread Mohamed Nadjib MAMI

Spark 2.1, so I think it doesn't include the new cost-based optimizations (introduced in Spark 2.2). *Regards, Grüße, **Cordialement,** Recuerdos, Saluti, προσρήσεις, 问候, تحياتي.* *Mohamed Nadjib Mami* *Research Associate @ Fraunhofer IAIS - PhD Student @ Bonn University* *About me! &

Filling Parquet files by values in Value of a JavaPairRDD

Parition RDD by key to create DataFrames

Spark SQL - java.lang.StackOverflowError after caching table

Re: Spark SQL - java.lang.StackOverflowError after caching table

Re: Spark SQL - java.lang.StackOverflowError after caching table

SQL query results: what is being cached?

Too many open files, why changing ulimit not effecting?

ErrorToken illegal character in a query having / @ $ . symbols

Re: Spark on YARN: java.lang.ClassCastException SerializedLambda to org.apache.spark.api.java.function.Function in instance of org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1

Java.util.ArrayList is not a valid external type for schema of array

SparkSQL: intra-SparkSQL-application table registration

df.count() returns one more count than SELECT COUNT()

Re: df.count() returns one more count than SELECT COUNT()

Help explaining explain() after DataFrame join reordering

14 matches

Site Navigation

Mail list logo

Footer information