Do you have one document per file or multiple document in the file?
On 4 Jul 2015 23:38, Michal Čizmazia mici...@gmail.com wrote:
Spark Context has a method wholeTextFiles. Is that what you need?
On 4 July 2015 at 07:04, rishikesh rishikeshtha...@hotmail.com wrote:
Hi
I am new to Spark
Hey all,
Is it possible to reliably get the version string of a Spark cluster prior
to trying to connect via the SparkContext on the client side? Most of the
errors I've seen on mismatched versions have been cryptic, so it would be
helpful if I could throw an exception earlier.
I know it is
To somewhat answer my own question - it looks like an empty request to the
rest API will throw an error which returns the version in JSON as well.
Still not ideal though. Would there be any objection to adding a simple
version endpoint to the API?
On Sat, Jul 4, 2015 at 4:00 PM, Patrick Woody
Though i have set hive.security.authorization.enabled=true and
hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory
a User X can select table belonging to User Y as for some reason Spark SQL
Thrift server is not doing
Hi,
I'm just getting started with Spark so apologies if this I'm missing
something obvious. In the below, I'm using Spark 1.4.
I've created a partitioned table in S3 (call it 'dataset'), with basic
structure like so:
s3://bucket/dataset/pk=a
s3://bucket/dataset/pk=b
s3://bucket/dataset/pk=c
Currently the number of retries is hardcoded.
You may want to open a JIRA which makes the retry count configurable.
Cheers
On Thu, Jul 2, 2015 at 8:35 PM, luohui20...@sina.com wrote:
Hi there,
i check the source code and found that in
org.apache.spark.deploy.client.AppClient, there
Please take a look
at
streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala
def saveAsHadoopFiles[F : OutputFormat[K, V]](
prefix: String,
suffix: String
)(implicit fm: ClassTag[F]): Unit = ssc.withScope {
Cheers
On Sat, Jul 4, 2015 at 5:23
Hello,
I am having issues with splitting contents of a dataframe column using Spark
1.4. The dataframe was created by reading a nested complex json file. I used
df.explode but keep getting error message.
scala val df = sqlContext.read.json(/Users/xx/target/statsfile.json)
scala df.show()
Hi,
Just looking for some clarity on the below 1.4 documentation.
And restarting from earlier checkpoint information of pre-upgrade code
cannot be done. The checkpoint information essentially contains serialized
Scala/Java/Python objects and trying to deserialize objects with new,
modified
Hi
Thanks, I guess this will solve my problem. I will load mutiple files using
wildcard's likes *.csv. I guess if I use wholeTextFile instead of textFile, I
will get whole file contents as value which will in turn ensure one feature
vector per file.
thanksNitin
Date: Sat, 4 Jul 2015 09:37:52
I have one document per file and each file is to be converted to a feature
vector. Pretty much like standard feature construction for document
classification.
ThanksRishi
Date: Sun, 5 Jul 2015 01:44:04 +1000
Subject: Re: Feature Generation On Spark
From: guha.a...@gmail.com
To:
Hi All
I have a requireent to connect to a DB every few minutes and bring data to
HBase. Can anyone suggest if spark streaming would be appropriate for this
senario or I shoud look into jobserver?
Thanks in advance
--
Best Regards,
Ayan Guha
See this thread:
http://search-hadoop.com/m/q3RTt4CqUGAvnPj2/Spark+master+buildsubj=Re+Can+not+build+master
On Jul 4, 2015, at 9:44 PM, Alec Taylor alec.tayl...@gmail.com wrote:
Running: `build/mvn -DskipTests clean package` on Ubuntu 15.04 (amd64,
3.19.0-21-generic) with Apache Maven
Thanks, will just build from spark-1.4.0.tgz in the meantime.
On Sun, Jul 5, 2015 at 2:52 PM, Ted Yu yuzhih...@gmail.com wrote:
See this thread:
http://search-hadoop.com/m/q3RTt4CqUGAvnPj2/Spark+master+buildsubj=Re+Can+not+build+master
On Jul 4, 2015, at 9:44 PM, Alec Taylor
I'm computing connected components using Spark GraphX on AWS EC2. I believe
the computation was successful, as I saw the type information of the final
result. However, it looks like Spark was doing some cleanup. The
BlockManager removed a bunch of blocks and stuck at
15/07/04 21:53:06 INFO
Hello,
How should I write a text file stream DStream to HDFS.
I tried the the following
val lines = ssc.textFileStream(hdfs:/user/hadoop/spark/streaming/input/)
lines.saveAsTextFile(hdfs:/user/hadoop/output1)
val lines = ssc.textFileStream(hdfs:/user/hadoop/spark/streaming/input/)
Running: `build/mvn -DskipTests clean package` on Ubuntu 15.04 (amd64,
3.19.0-21-generic) with Apache Maven 3.3.3 starts to build fine, then just
keeps outputting these lines:
[INFO] Dependency-reduced POM written at:
/spark/bagel/dependency-reduced-pom.xml
I've kept it running for an hour.
How
I had a similar inquiry, copied below.
I was also looking into making an SQS Receiver reliable:
http://stackoverflow.com/questions/30809975/reliable-sqs-receiver-for-spark-streaming
Hope this helps.
-- Forwarded message --
From: Tathagata Das t...@databricks.com
Date: 20 June
Hi
I am new to Spark and am working on document classification. Before model
fitting I need to do feature generation. Each document is to be converted to
a feature vector. However I am not sure how to do that. While testing
locally I have a static list of tokens and when I parse a file I do a
19 matches
Mail list logo