date:20160109

Re: pyspark: conditionals inside functions

2016-01-09 Thread Franc Carter

Got it, I needed to use the when/otherwise construct - code below def getSunday(day): day = day.cast("date") sun = next_day(day, "Sunday") n = datediff(sun,day) x = when(n==7,day).otherwise(sun) return x On 10 January 2016 at 08:41, Franc Carter wrote: > > My Python is

java.lang.NoClassDefFoundError even when use sc.addJar

2016-01-09 Thread rayqiu

Code: val sc = new SparkContext(sparkConf) sc.addJar("/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-streaming-kafka-assembly_2.10-1.6.0.jar") >spark-submit --class "GeoIP" target/scala-2.10/geoip-assembly-1.0.jar Show jar added: 16/01/09 16:05:20 INFO SparkContext: Added JAR /opt/spark-1.6.0-bin-

StandardScaler in spark.ml.feature requires vector input?

2016-01-09 Thread Kristina Rogale Plazonic

Hi, The code below gives me an unexpected result. I expected that StandardScaler (in ml, not mllib) will take a specified column of an input dataframe and subtract the mean of the column and divide the difference by the standard deviation of the dataframe column. However, Spark gives me the error

pyspark: calculating row deltas

2016-01-09 Thread Franc Carter

Hi, I have a DataFrame with the columns ID,Year,Value I'd like to create a new Column that is Value2-Value1 where the corresponding Year2=Year-1 At the moment I am creating a new DataFrame with renamed columns and doing DF.join(DF2, . . . .) This looks cumbersome to me, is there abt

Re: pyspark: conditionals inside functions

2016-01-09 Thread Franc Carter

My Python is not particularly good, so I'm afraid I don't understand what that mean cheers On 9 January 2016 at 14:45, Franc Carter wrote: > > Hi, > > I'm trying to write a short function that returns the last sunday of the > week of a given date, code below > > def getSunday(day): > > day

spark access old version of Hadoop 2.1.0 and Hive version 0.11

2016-01-09 Thread Jade Liu

Hi, All: I'm trying to read and write from the hdfs cluster using SparkSQL hive context. My current build of spark is 1.5.2. The problem is that currently our company has very old version of hdfs (hadoop 2.1.0) and hive metastore (0.11) using Hortonworks bundle. One of the possible solution i

Re: Best IDE Configuration

2016-01-09 Thread Ted Yu

Please take a look at: https://cwiki.apache.org/confluence/display/SPARK/ Useful+Developer+Tools#UsefulDeveloperTools-IDESetup On Sat, Jan 9, 2016 at 11:16 AM, Jorge Machado wrote: > Hello everyone, > > > I´m just wondering how do you guys develop for spark. > > For example I cannot find any dec

Best IDE Configuration

2016-01-09 Thread Jorge Machado

Hello everyone, I´m just wondering how do you guys develop for spark. For example I cannot find any decent documentation for connecting Spark to Eclipse using maven or sbt. Is there any link around ? Jorge thanks - To u

Re: org.apache.spark.storage.BlockNotFoundException in Spark1.5.2+Tachyon0.7.1

2016-01-09 Thread Gene Pang

Yes, the tiered storage feature in Tachyon can address this issue. Here is a link to more information: http://tachyon-project.org/documentation/Tiered-Storage-on-Tachyon.html Thanks, Gene On Wed, Jan 6, 2016 at 8:44 PM, Ted Yu wrote: > Have you seen this thread ? > > http://search-hadoop.com/m/

broadcast params to workers at the very beginning

2016-01-09 Thread octavian.ganea

Hi, In my app, I have a Params scala object that keeps all the specific (hyper)parameters of my program. This object is read in each worker. I would like to be able to pass specific values of the Params' fields in the command line. One way would be to simply update all the fields of the Params obj

Re: pyspark: conditionals inside functions

2016-01-09 Thread Maciej Szymkiewicz

On 01/09/2016 04:45 AM, Franc Carter wrote: > > Hi, > > I'm trying to write a short function that returns the last sunday of > the week of a given date, code below > > def getSunday(day): > > day = day.cast("date") > > sun = next_day(day, "Sunday") > >

Re: [discuss] dropping Python 2.6 support

2016-01-09 Thread Jacek Laskowski

On Sat, Jan 9, 2016 at 1:48 PM, Sean Owen wrote: > (For similar reasons I personally don't favor supporting Java 7 or > Scala 2.10 in Spark 2.x.) That reflects my sentiments as well. Thanks Sean for bringing that up! Jacek - T

Re: [discuss] dropping Python 2.6 support

2016-01-09 Thread Sean Owen

Chiming in late, but my take on this line of argument is: these companies are welcome to keep using Spark 1.x. If anything the argument here is about how long to maintain 1.x, and indeed, it's going to go dormant quite soon. But using RHEL 6 (or any old-er version of any platform) and not wanting

Re: [discuss] dropping Python 2.6 support

2016-01-09 Thread Sasha Kacanski

+1 Companies that use stock python in redhat 2.6 will need to upgrade or install fresh version wich is total of 3.5 minutes so no issues ... On Tue, Jan 5, 2016 at 2:17 AM, Reynold Xin wrote: > Does anybody here care about us dropping support for Python 2.6 in Spark > 2.0? > > Python 2.6 is anci

Re: How to merge two large table and remove duplicates?

2016-01-09 Thread Ted Yu

See the first half of this wiki: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO > On Jan 9, 2016, at 1:02 AM, Gavin Yue wrote: > > So I tried to set the parquet compression codec to lzo, but hadoop does not > have the lzo natives, while lz4 does included. > But I could se

Re: How to merge two large table and remove duplicates?

2016-01-09 Thread Sayan Sanyal

Unsubscribe Sent from Outlook Mobile _ From: Gavin Yue Sent: Saturday, January 9, 2016 14:33 Subject: Re: How to merge two large table and remove duplicates? To: Ted Yu Cc: Benyi Wang , user , ayan guha So I tried to set the parqu

Re: How to merge two large table and remove duplicates?

2016-01-09 Thread Gavin Yue

So I tried to set the parquet compression codec to lzo, but hadoop does not have the lzo natives, while lz4 does included. But I could set the code to lz4, it only accepts lzo. Any solution here? Thank, Gavin On Sat, Jan 9, 2016 at 12:09 AM, Gavin Yue wrote: > I saw in the document, the valu

Re: How to merge two large table and remove duplicates?

2016-01-09 Thread Gavin Yue

I saw in the document, the value is LZO.Is it LZO or LZ4? https://github.com/Cyan4973/lz4 Based on this benchmark, they differ quite a lot. On Fri, Jan 8, 2016 at 9:55 PM, Ted Yu wrote: > gzip is relatively slow. It consumes much CPU. > > snappy is faster. > > LZ4 is faster than GZIP and

Re: pyspark: conditionals inside functions

java.lang.NoClassDefFoundError even when use sc.addJar

StandardScaler in spark.ml.feature requires vector input?

pyspark: calculating row deltas

Re: pyspark: conditionals inside functions

spark access old version of Hadoop 2.1.0 and Hive version 0.11

Re: Best IDE Configuration

Best IDE Configuration

Re: org.apache.spark.storage.BlockNotFoundException in Spark1.5.2+Tachyon0.7.1

broadcast params to workers at the very beginning

Re: pyspark: conditionals inside functions

Re: [discuss] dropping Python 2.6 support

Re: [discuss] dropping Python 2.6 support

Re: [discuss] dropping Python 2.6 support

Re: How to merge two large table and remove duplicates?

Re: How to merge two large table and remove duplicates?

Re: How to merge two large table and remove duplicates?

Re: How to merge two large table and remove duplicates?

18 matches

Site Navigation

Mail list logo

Footer information