Thanks Sean, your suggestions and the links provided are just what I needed
to start off with.
On Sun, Mar 15, 2015 at 6:16 PM, Sean Owen so...@cloudera.com wrote:
I think you're assuming that you will pre-compute recommendations and
store them in Mongo. That's one way to go, with certain
Currently there’s no convenient way to convert a
|SchemaRDD|/|JavaSchemaRDD| back to an |RDD|/|JavaRDD| of some case
class. But you can convert an |RDD|/|JavaRDD| into an
|RDD[Row]|/|JavaRDDRow| using |schemaRdd.rdd| and |new
JavaRDDRow(schemaRdd.rdd)|.
Cheng
On 3/15/15 10:22 PM, Renato
have you fixed this issue ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-on-Yarn-Input-from-Flume-tp11755p22055.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
I think (I hope) it's because the generic builds just work. Even
though these are of course distributed mostly verbatim in CDH5, with
tweaks to be compatible with other stuff at the edges, the stock
builds should be fine too. Same for HDP as I understand.
The CDH4 build may work on some builds of
Hi Spark experts,
Is there a way to convert a JavaSchemaRDD (for instance loaded from a
parquet file) back to a JavaRDD of a given case class? I read on
stackOverFlow[1] that I could do a select over the parquet file and then by
reflection get the fields out, but I guess that would be an
Thank you for your help. toDF() solved my first problem. And, the
second issue was a non-issue, since the second example worked without any
modification.
David
On Sun, Mar 15, 2015 at 1:37 AM, Rishi Yadav ri...@infoobjects.com wrote:
programmatically specifying Schema needs
import
i am doing word count example on flume stream and trying to save output as
text files in HDFS , but in the save directory i got multiple sub
directories each having files with small size , i wonder if there is a way
to append in a large file instead of saving in multiple files , as i intend
to
I think you're assuming that you will pre-compute recommendations and
store them in Mongo. That's one way to go, with certain tradeoffs. You
can precompute offline easily, and serve results at large scale
easily, but, you are forced to precompute everything -- lots of wasted
effort, not completely
Ah most interesting—thanks.
So it seems sc.textFile(longFileList) has to read all metadata before starting
the read for partitioning purposes so what you do is not use it?
You create a task per file that reads one file (in parallel) per task without
scanning for _all_ metadata. Can’t argue
i was having a similar issue but it was in spark and flume integration i was
getting failed to bind error , but got it fixed by shutting down firewall
for both machines (make sure : service iptables status = firewall stopped)
--
View this message in context:
Is there a reason why the prebuilt releases don't include current CDH distros
and YARN support?
Eric Friedman
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail:
I have got NullPointerException in aggregateMessages on a graph which is
the output of mapVertices function of a graph. I found the problem is
because of the mapVertices funciton did not affect all the triplet of the
graph.
// Initial the graph, assign a counter to each vertex that contains
the
In LBFGS version of logistic regression, the data is properly
standardized, so this should not happen. Can you provide a copy of
your dataset to us so we can test it? If the dataset can not be
public, can you have just send me a copy so I can dig into this? I'm
the author of LORWithLBFGS. Thanks.
Yes I don't think this is entirely reliable in general. I would emit
(label,features) pairs and then transform the values.
In practice, this may happen to work fine in simple cases.
On Sun, Mar 15, 2015 at 3:51 AM, kian.ho hui.kian.ho+sp...@gmail.com wrote:
Hi, I was taking a look through the
Hi,
Can anyone who has developed recommendation engine suggest what could be
the possible software stack for such an application.
I am basically new to recommendation engine , I just found out Mahout and
Spark Mlib which are available .
I am thinking the below software stack.
1. The user is
Spark SQL supports most commonly used features of HiveQL. However,
different HiveQL statements are executed in different manners:
1.
DDL statements (e.g. |CREATE TABLE|, |DROP TABLE|, etc.) and
commands (e.g. |SET key = value|, |ADD FILE|, |ADD JAR|, etc.)
In most cases, Spark SQL
Hi
I would like to share with you my comments on Hortonworks' benchmarks of
'Hive on Tez' vs 'Hive on Spark' vs 'Spark SQL'.
Please check them in my related blog entry at http://goo.gl/K5mk0U
Thanks
Slim Baltagi
Chicago, IL
http://www.SparkBigData.com
--
View this message in context:
Hello all,
Thank you for your responses. I did try to include the
zookeeper.znode.parent property in the hbase-site.xml. It still continues
to give the same error.
I am using Spark 1.2.0 and hbase 0.98.9.
Could you please suggest what else could be done?
On Fri, Mar 13, 2015 at 10:25 PM, Ted
That's an unfortunate documentation bug in the programming guide... We
failed to update it after making the change.
Cheng
On 2/28/15 8:13 AM, Deborah Siegel wrote:
Hi Michael,
Would you help me understand the apparent difference here..
The Spark 1.2.1 programming guide indicates:
Note
Thanks,
It worked.
-Abhi
On Tue, Mar 3, 2015 at 5:15 PM, Tobias Pfeiffer t...@preferred.jp wrote:
Hi,
On Wed, Mar 4, 2015 at 6:20 AM, Zhan Zhang zzh...@hortonworks.com wrote:
Do you have enough resource in your cluster? You can check your resource
manager to see the usage.
Yep, I can
This article by Ryan Blue should be helpful to understand the problem
http://ingest.tips/2015/01/31/parquet-row-group-size/
The TL;DR is, you may decrease |parquet.block.size| to reduce memory
consumption. Anyway, 100K columns is a really big burden for Parquet,
but I guess your data should
Thanks Nick, for your suggestions.
On Sun, Mar 15, 2015 at 10:41 PM, Nick Pentreath nick.pentre...@gmail.com
wrote:
As Sean says, precomputing recommendations is pretty inefficient. Though
with 500k items its easy to get all the item vectors in memory so
pre-computing is not too bad.
Still,
Hi again
Tried the same
examples/src/main/scala/org/apache/spark/examples/mllib/StreamingLinearRegression.scala
from 1.3.0
and getting in case testing file content is:
(0.0,[3.0,4.0,3.0])
(0.0,[4.0,4.0,4.0])
(4.0,[5.0,5.0,5.0])
(5.0,[5.0,6.0,6.0])
(6.0,[7.0,4.0,7.0])
Hi All,
I am trying to submit the spark application using yarn rest API. I am able
to submit the application but final status shows as 'UNDEFINED.'. Couple of
other observations:
User shows as Dr.who
Application type is empty though I specify it as Spark
Is any one had this problem before?
I
The parquet-tools code should be pretty helpful (although it's Java)
https://github.com/apache/incubator-parquet-mr/tree/master/parquet-tools/src/main/java/parquet/tools/command
On 3/10/15 12:25 AM, Shuai Zheng wrote:
Hi All,
I have a lot of parquet files, and I try to open them directly
Hey Yong,
It seems that Hadoop `FileSystem` adds the size of a block to the
metrics even if you only touch a fraction of it (reading Parquet
metadata for example). This behavior can be verified by the following
snippet:
```scala
import org.apache.spark.sql.Row
import
As Sean says, precomputing recommendations is pretty inefficient. Though with
500k items its easy to get all the item vectors in memory so pre-computing is
not too bad.
Still, since you plan to serve these via a REST service anyway, computing on
demand via a serving layer such as Oryx or
It should be OK. If you encountered problems in having a long opened
connection to the Thrift server, it should be a bug.
Cheng
On 3/9/15 6:41 PM, fanooos wrote:
I have some applications developed using PHP and currently we have a problem
in connecting these applications to spark sql thrift
org.apache.hbase % hbase % 0.98.9-hadoop2 % provided,
There is no module in hbase 0.98.9 called hbase. But this would not be the
root cause of the error.
Most likely hbase-site.xml was not picked up. Meaning this is classpath
issue.
On Sun, Mar 15, 2015 at 10:04 AM, HARIPRIYA AYYALASOMAYAJULA
Hi SM,
Apologize for delayed response.
No, the issue is with Spark 1.2.0. There is a bug in Spark 1.2.0.
Recently Spark have latest 1.3.0 release so it might have fixed in it.
I am not planning to test it soon, may be after some time.
You can try for it.
Regards,
Shailesh
--
View this
Thanks Cheng for the great explanation!
bit1...@163.com
From: Cheng Lian
Date: 2015-03-16 00:53
To: bit1...@163.com; Wang, Daoyuan; user
Subject: Re: Explanation on the Hive in the Spark assembly
Spark SQL supports most commonly used features of HiveQL. However, different
HiveQL statements
Thanks, Jerry
I got that way. Just to make sure whether there can be some option to directly
specifying tachyon version.
fightf...@163.com
From: Shao, Saisai
Date: 2015-03-16 11:10
To: fightf...@163.com
CC: user
Subject: RE: Building spark over specified tachyon
I think you could change the
ca you share some sample data
On Sun, Mar 15, 2015 at 8:51 PM, Rohit U rjupadhy...@gmail.com wrote:
Hi,
I am trying to run LogisticRegressionWithSGD on RDD of LabeledPoints
loaded using loadLibSVMFile:
val logistic: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc,
Hi Sparkers,
I couldn't able to run spark-sql on spark.Please find the following error
Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
Regards,
Sandeep.v
Hi
The video recording of this talk titled Spark or Hadoop: is it an either-or
proposition? at the Los Angeles Spark Users Group on March 12, 2015 is now
available on youtube at this link: http://goo.gl/0iJZ4n
Thanks
Slim Baltagi
http://www.SparkBigData.com
--
View this message in context:
Can you provide more information ?
Such as:
Version of Spark you're using
Command line
Thanks
On Mar 15, 2015, at 9:51 PM, sandeep vura sandeepv...@gmail.com wrote:
Hi Sparkers,
I couldn't able to run spark-sql on spark.Please find the following error
Unable to instantiate
I think you could change the pom file under Spark project to update the Tachyon
related dependency version and rebuild it again (in case API is compatible, and
behavior is the same).
I'm not sure is there any command you can use to compile against Tachyon
version.
Thanks
Jerry
From:
Hi,
I am running k-means using Spark in local mode. My data set is about 30k
records, and I set the k = 1000.
The algorithm starts and finished 13 jobs according to the UI monitor, then
it stopped working.
The last log I saw was:
[Spark Context Cleaner] INFO org.apache.spark.ContextCleaner -
I checked the labels across the entire dataset and it looks like it has -1
and 1 (not the 0 and 1 I originally expected). I will try replacing the -1
with 0 and run it again.
On Mon, Mar 16, 2015 at 12:51 AM, Rishi Yadav ri...@infoobjects.com wrote:
ca you share some sample data
On Sun, Mar
Hi Ted,
I am using Spark -1.2.1 and hive -0.13.1 you can check my configuration
files attached below.
ERROR IN SPARK
n: Unable to instantiate
org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at
Hi, all
Noting that the current spark releases are built-in with tachyon 0.5.0 ,
if we want to recompile spark with maven and targeting on specific tachyon
version (let's say the most recent 0.6.0 release),
how should that be done? What maven compile command should be like ?
Thanks,
Sun.
I figured out how to use local files with file:// but not with either the
persistent or ephemeral-hdfs
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Trouble-launching-application-that-reads-files-tp22065p22068.html
Sent from the Apache Spark User List
Thanks haoyuan.
fightf...@163.com
From: Haoyuan Li
Date: 2015-03-16 12:59
To: fightf...@163.com
CC: Shao, Saisai; user
Subject: Re: RE: Building spark over specified tachyon
Here is a patch: https://github.com/apache/spark/pull/4867
On Sun, Mar 15, 2015 at 8:46 PM, fightf...@163.com
Hi Margus, thanks for reporting this, I’ve been able to reproduce and there
does indeed appear to be a bug. I’ve created a JIRA and have a fix ready, can
hopefully include in 1.3.1.
In the meantime, you can get the desired result using transform:
model.trainOn(trainingData)
Hi Ted,
Did you find any solution.
Thanks
Sandeep
On Mon, Mar 16, 2015 at 10:44 AM, sandeep vura sandeepv...@gmail.com
wrote:
Hi Ted,
I am using Spark -1.2.1 and hive -0.13.1 you can check my configuration
files attached below.
ERROR IN SPARK
45 matches
Mail list logo