Spark 2.1, so I think it doesn't include the new cost-based
optimizations (introduced in Spark 2.2).
*Regards, Grüße, **Cordialement,** Recuerdos, Saluti, προσρήσεις, 问候,
تحياتي.*
*Mohamed Nadjib Mami*
*Research Associate @ Fraunhofer IAIS - PhD Student @ Bonn University*
*About me! &
That was the case. Thanks for the quick and clean answer, Hemanth.
*Regards, Grüße, **Cordialement,** Recuerdos, Saluti, προσρήσεις, 问候,
تحياتي.*
*Mohamed Nadjib Mami*
*Research Associate @ Fraunhofer IAIS - PhD Student @ Bonn University*
*About me! <http://www.strikingly.com/mohamed-nadjib-m
I paste this right from Spark shell (Spark 2.1.0):
/scala> spark.sql("SELECT count(distinct col) FROM Table").show()//
//+-+ //
//|count(DISTINCT col)|//
//+-+//
//|4697|//
//+-+//
//scala> spark.sql
Hello,
I've asked the following question [1] on Stackoverflow but didn't get an
answer, yet. I use now this channel to give it more visibility, and
hopefully find someone who can help.
"*Context.* I have tens of SQL queries stored in separate files. For
benchmarking purposes, I created an ap
in Parquet
tables. Any help on solving/working around this would be very appreciated.
*Regards, Grüße, **Cordialement,** Recuerdos, Saluti, προσρήσεις, 问候,
تحياتي.*
*Mohamed Nadjib Mami*
*PhD Student - EIS Department - **Bonn University (Germany).*
*About me! <http://www.strikingly.com/mohame
I noticed that in most SQL queries (sqlContext.sql(query)) I ran on
Parquet tables that some results are returned faster after the first and
second run of the query. Is this variation normal i.e. two executions of
the same job can take different times? or there is some intermediate
results bein
6.id=p.id ORDER BY p.`bbb` LIMIT 10"
On 24.03.2016 22:16, Ted Yu wrote:
Can you obtain output from explain(true) on the query after
cacheTable() call ?
Potentially related JIRA:
[SPARK-13657] [SQL] Support parsing very long AND/OR expressions
On Thu, Mar 24, 2016 at 12:55 PM, Mo
?
If you can show snippet of your code, that would help give us more clue.
Thanks
On Mar 24, 2016, at 2:43 AM, Mohamed Nadjib MAMI wrote:
Hi all,
I'm running SQL queries (sqlContext.sql()) on Parquet tables and facing a
problem with table caching (sqlContext.cacheTable()), using spark-she
Hi all,
I'm running SQL queries (sqlContext.sql()) on Parquet tables and facing
a problem with table caching (sqlContext.cacheTable()), using
spark-shell of Spark 1.5.1.
After I run the sqlContext.cacheTable(table), the sqlContext.sql(query)
takes longer the first time (well, for the lazy exe
Hi,
I have a pair RDD of the form: (mykey, (value1, value2))
How can I create a DataFrame having the schema [V1 String, V2 String] to
store [value1, value2] and save it into a Parquet table named "mykey"?
/createDataFrame()/ method takes an RDD and a schema (StructType) in
parameters. The sc
Hello all,
Could someone please help me figure out what wrong with my query that
I'm running over Parquet tables? the query has the following form:
weird_query = "SELECT a._example.com/aa/1.1/aa_,
b._example.com/bb/1.2/bb_ FROM www$aa@aa a LEFT JOIN www$bb@bb b ON
a.http://example.de/cc=b.co
Hello all,
I'm getting the famous /java.io.FileNotFoundException: ... (Too many
open files) /exception. What seemed to have helped people out, it
haven't for me. I tried to set the ulimit via the command line /"ulimit
-n"/, then I tried to add the following lines to
/"/etc/security/limits.con
Your jars are not delivered to the workers. Have a look at this:
http://stackoverflow.com/questions/24052899/how-to-make-it-easier-to-deploy-my-jar-to-spark-cluster-in-standalone-mode
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-YARN-java-lang-Cl
Hello Sparkers,
I'm reading data from a CSV file, applying some transformations and ending
up with an RDD of pairs (String,Iterable<>).
I have already prepared Parquet files. I want now to take the previous
(key,value) RDD and populate the parquet files like follows:
- key holds the name of the
14 matches
Mail list logo