Got it, I needed to use the when/otherwise construct - code below
def getSunday(day):
day = day.cast("date")
sun = next_day(day, "Sunday")
n = datediff(sun,day)
x = when(n==7,day).otherwise(sun)
return x
On 10 January 2016 at 08:41, Franc Carter wrote:
>
> My Python is
Code:
val sc = new SparkContext(sparkConf)
sc.addJar("/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-streaming-kafka-assembly_2.10-1.6.0.jar")
>spark-submit --class "GeoIP" target/scala-2.10/geoip-assembly-1.0.jar
Show jar added:
16/01/09 16:05:20 INFO SparkContext: Added JAR
/opt/spark-1.6.0-bin-
Hi,
The code below gives me an unexpected result. I expected that
StandardScaler (in ml, not mllib) will take a specified column of an input
dataframe and subtract the mean of the column and divide the difference by
the standard deviation of the dataframe column.
However, Spark gives me the error
Hi,
I have a DataFrame with the columns
ID,Year,Value
I'd like to create a new Column that is Value2-Value1 where the
corresponding Year2=Year-1
At the moment I am creating a new DataFrame with renamed columns and doing
DF.join(DF2, . . . .)
This looks cumbersome to me, is there abt
My Python is not particularly good, so I'm afraid I don't understand what
that mean
cheers
On 9 January 2016 at 14:45, Franc Carter wrote:
>
> Hi,
>
> I'm trying to write a short function that returns the last sunday of the
> week of a given date, code below
>
> def getSunday(day):
>
> day
Hi, All:
I'm trying to read and write from the hdfs cluster using SparkSQL hive context.
My current build of spark is 1.5.2. The problem is that currently our company
has very old version of hdfs (hadoop 2.1.0) and hive metastore (0.11) using
Hortonworks bundle.
One of the possible solution i
Please take a look at:
https://cwiki.apache.org/confluence/display/SPARK/
Useful+Developer+Tools#UsefulDeveloperTools-IDESetup
On Sat, Jan 9, 2016 at 11:16 AM, Jorge Machado wrote:
> Hello everyone,
>
>
> I´m just wondering how do you guys develop for spark.
>
> For example I cannot find any dec
Hello everyone,
I´m just wondering how do you guys develop for spark.
For example I cannot find any decent documentation for connecting Spark to
Eclipse using maven or sbt.
Is there any link around ?
Jorge
thanks
-
To u
Yes, the tiered storage feature in Tachyon can address this issue. Here is
a link to more information:
http://tachyon-project.org/documentation/Tiered-Storage-on-Tachyon.html
Thanks,
Gene
On Wed, Jan 6, 2016 at 8:44 PM, Ted Yu wrote:
> Have you seen this thread ?
>
> http://search-hadoop.com/m/
Hi,
In my app, I have a Params scala object that keeps all the specific
(hyper)parameters of my program. This object is read in each worker. I would
like to be able to pass specific values of the Params' fields in the command
line. One way would be to simply update all the fields of the Params obj
On 01/09/2016 04:45 AM, Franc Carter wrote:
>
> Hi,
>
> I'm trying to write a short function that returns the last sunday of
> the week of a given date, code below
>
> def getSunday(day):
>
> day = day.cast("date")
>
> sun = next_day(day, "Sunday")
>
>
On Sat, Jan 9, 2016 at 1:48 PM, Sean Owen wrote:
> (For similar reasons I personally don't favor supporting Java 7 or
> Scala 2.10 in Spark 2.x.)
That reflects my sentiments as well. Thanks Sean for bringing that up!
Jacek
-
T
Chiming in late, but my take on this line of argument is: these
companies are welcome to keep using Spark 1.x. If anything the
argument here is about how long to maintain 1.x, and indeed, it's
going to go dormant quite soon.
But using RHEL 6 (or any old-er version of any platform) and not
wanting
+1
Companies that use stock python in redhat 2.6 will need to upgrade or
install fresh version wich is total of 3.5 minutes so no issues ...
On Tue, Jan 5, 2016 at 2:17 AM, Reynold Xin wrote:
> Does anybody here care about us dropping support for Python 2.6 in Spark
> 2.0?
>
> Python 2.6 is anci
See the first half of this wiki:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO
> On Jan 9, 2016, at 1:02 AM, Gavin Yue wrote:
>
> So I tried to set the parquet compression codec to lzo, but hadoop does not
> have the lzo natives, while lz4 does included.
> But I could se
Unsubscribe
Sent from Outlook Mobile
_
From: Gavin Yue
Sent: Saturday, January 9, 2016 14:33
Subject: Re: How to merge two large table and remove duplicates?
To: Ted Yu
Cc: Benyi Wang , user , ayan guha
So I tried to set the parqu
So I tried to set the parquet compression codec to lzo, but hadoop does not
have the lzo natives, while lz4 does included.
But I could set the code to lz4, it only accepts lzo.
Any solution here?
Thank,
Gavin
On Sat, Jan 9, 2016 at 12:09 AM, Gavin Yue wrote:
> I saw in the document, the valu
I saw in the document, the value is LZO.Is it LZO or LZ4?
https://github.com/Cyan4973/lz4
Based on this benchmark, they differ quite a lot.
On Fri, Jan 8, 2016 at 9:55 PM, Ted Yu wrote:
> gzip is relatively slow. It consumes much CPU.
>
> snappy is faster.
>
> LZ4 is faster than GZIP and
18 matches
Mail list logo