You can use backticks to quote the column names.
Cheng
On 6/3/15 2:49 AM, David Mitchell wrote:
I am having the same problem reading JSON. There does not seem to be
a way of selecting a field that has a space, Executor Info from the
Spark logs.
I suggest that we open a JIRA ticket to
Thanks Cheng, we have a workaround in place for Spark 1.3 (remove
.metadata directory), good to know it will be resolved in 1.4.
-Don
On Sun, Jun 7, 2015 at 8:51 AM, Cheng Lian lian.cs@gmail.com wrote:
This issue has been fixed recently in Spark 1.4
- Remove localhost from the conf/slaves file, add the slaves private ip.
- Make sure master and slave machines are on the same security group (this
way all ports will be accessible to all machines)
- In conf/spark-env.sh file, place export
SPARK_MASTER_IP=MASTER-NODES-PUBLIC-OR-PRIVATE-IP and
Hi Sam,
Have a look at Sematext's SPM for your Spark monitoring needs. If the
problem is CPU, IO, Network, etc. as Ahkil mentioned, you'll see that in
SPM, too.
As for the number of jobs running, you have see a chart with that at
http://sematext.com/spm/integrations/spark-monitoring.html
Otis
--
On 6/6/15 9:06 AM, James Pirz wrote:
I am pretty new to Spark, and using Spark 1.3.1, I am trying to use
'Spark SQL' to run some SQL scripts, on the cluster. I realized that
for a better performance, it is a good idea to use Parquet files. I
have 2 questions regarding that:
1) If I wanna
Hi,
How can I expect to work on HiveContext on the executor? If only the
driver can see HiveContext, does it mean I have to collect all datasets
(very large) to the driver and use HiveContext there? It will be memory
overload on the driver and fail.
BR,
Patcharee
On 07. juni 2015 11:51,
Is it possible that some Parquet files of this data set have different
schema as others? Especially those ones reported in the exception messages.
One way to confirm this is to use [parquet-tools] [1] to inspect these
files:
$ parquet-schema path-to-file
Cheng
[1]:
Hi All,
I have a Spark SQL application to fetch data from Hive, on top I have a akka
layer to run multiple Queries in parallel.
*Please suggest a mechanism, so as to figure out the number of spark jobs
running in the cluster at a given instance of time. *
I need to do the above as, I see the
Usually Parquet can be more efficient because of its columnar nature.
Say your table has 10 columns but your join query only touches 3 of
them, Parquet only reads those 3 columns from disk while Avro must load
all data.
Cheng
On 6/5/15 3:00 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
We currently have data in
Spark SQL supports Hive dynamic partitioning, so one possible workaround
is to create a Hive table partitioned by zone, z, year, and month
dynamically, and then insert the whole dataset into it directly.
In 1.4, we also provides dynamic partitioning support for non-Hive
environment, and you
What is decision list ? Inorder traversal (or some other traversal) of
fitted decision tree
On Jun 5, 2015 1:21 AM, Sateesh Kavuri sateesh.kav...@gmail.com wrote:
Is there an existing way in SparkML to convert a decision tree to a
decision list?
On Thu, Jun 4, 2015 at 10:50 PM, Reza Zadeh
For the following code:
val df = sqlContext.parquetFile(path)
`df` remains columnar (actually it just reads from the columnar Parquet
file on disk). For the following code:
val cdf = df.cache()
`cdf` is also columnar but that's different from Parquet. When a
DataFrame is cached,
Were you using HiveContext.setConf()?
dfs.replication is a Hadoop configuration, but setConf() is only used
to set Spark SQL specific configurations. You may either set it in your
Hadoop core-site.xml.
Cheng
On 6/2/15 2:28 PM, Haopu Wang wrote:
Hi,
I'm trying to save SparkSQL DataFrame
Are you calling hiveContext.sql within an RDD.map closure or something
similar? In this way, the call actually happens on executor side.
However, HiveContext only exists on the driver side.
Cheng
On 6/4/15 3:45 PM, patcharee wrote:
Hi,
I am using Hive 0.14 and spark 0.13. I got
It could be a CPU, IO, Network bottleneck, you need to figure out where
exactly its chocking. You can use certain monitoring utilities (like top)
to understand it better.
Thanks
Best Regards
On Sun, Jun 7, 2015 at 4:07 PM, SamyaMaiti samya.maiti2...@gmail.com
wrote:
Hi All,
I have a Spark
This issue has been fixed recently in Spark 1.4
https://github.com/apache/spark/pull/6581
Cheng
On 6/5/15 12:38 AM, Marcelo Vanzin wrote:
I talked to Don outside the list and he says that he's seeing this
issue with Apache Spark 1.3 too (not just CDH Spark), so it seems like
there is a real
Interesting, just posted on another thread asking exactly the same
question :) My answer there quoted below:
For the following code:
val df = sqlContext.parquetFile(path)
`df` remains columnar (actually it just reads from the columnar
Parquet file on disk). For the following code:
What is the code used to set up the kafka stream?
On Sat, Jun 6, 2015 at 3:23 PM, EH eas...@gmail.com wrote:
And here is the Thread Dump, where seems every worker is waiting for
Executor
#6 Thread 95: sparkExecutor-akka.actor.default-dispatcher-22 (RUNNABLE) to
be complete:
Thread 41:
Another approach would be to use a zookeeper. If you have zookeeper
running somewhere in the cluster you can simply create a path like
*/dynamic-list* in it and then write objects/values to it, you can even
create/access nested objects.
Thanks
Best Regards
On Fri, Jun 5, 2015 at 7:06 PM,
Which consumer are you using? If you can paste the complete code then may
be i can try reproducing it.
Thanks
Best Regards
On Sun, Jun 7, 2015 at 1:53 AM, EH eas...@gmail.com wrote:
And here is the Thread Dump, where seems every worker is waiting for
Executor
#6 Thread 95:
Hi spark users:
After I submitted a SparkPi job to spark, the driver crashed at the end of the
job with the following log:
WARN EventLoggingListener: Event log dir
file:/d:/data/SparkWorker/work/driver-20150607200517-0002/logs/event does not
exists, will newly create one.
Exception in thread
Hi,
I'm trying to write a custom transformer in Spark ML and since that uses
DataFrames, am trying to use flatMap function in DataFrame class in Java.
Can you share a simple example of how to use the flatMap function to do word
count on single column of the DataFrame. Thanks.
Dimple
--
View
Thanks for replying twice :) I think I sent this question by email and
somehow thought I did not sent it, hence created the other one on the web
interface. Lets retain this thread since you have provided more details
here.
Great, it confirms my intuition about DataFrame. It's similar to Shark
Hi,
I'm trying to write a custom transformer in Spark ML and since that uses
DataFrames, am trying to use flatMap function in DataFrame class in Java.
Can you share a simple example of how to use the flatMap function to do
word count on single column of the DataFrame. Thanks
Dimple
Am I right in thinking that Python mllib does not contain the optimization
module? Are there plans to add this to the Python api?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Optimization-module-in-Python-mllib-tp23191.html
Sent from the Apache Spark
Hi,
This is expected behavior. HiveContext.sql (and also
DataFrame.registerTempTable) is only expected to be invoked on driver
side. However, the closure passed to RDD.foreach is executed on executor
side, where no viable HiveContext instance exists.
Cheng
On 6/7/15 10:06 AM, patcharee
26 matches
Mail list logo