Re: Deep learning libraries for scala

2016-09-30 Thread Suresh Thalamati
Tensor frames https://spark-packages.org/package/databricks/tensorframes Hope that helps -suresh > On Sep 30, 2016, at 8:00 PM, janardhan shetty wrote: > > Looking for scala dataframes in particular ? > >

Re: Deep learning libraries for scala

2016-09-30 Thread janardhan shetty
Looking for scala dataframes in particular ? On Fri, Sep 30, 2016 at 7:46 PM, Gavin Yue wrote: > Skymind you could try. It is java > > I never test though. > > > On Sep 30, 2016, at 7:30 PM, janardhan shetty > wrote: > > > > Hi, > > > > Are there

Re: Deep learning libraries for scala

2016-09-30 Thread Gavin Yue
Skymind you could try. It is java I never test though. > On Sep 30, 2016, at 7:30 PM, janardhan shetty wrote: > > Hi, > > Are there any good libraries which can be used for scala deep learning models > ? > How can we integrate tensorflow with scala ML ?

Re: Spark ML Decision Trees Algorithm

2016-09-30 Thread janardhan shetty
It would be good to know which paper has inspired to implement the version which we use in spark 2.0 decision trees ? On Fri, Sep 30, 2016 at 4:44 PM, Peter Figliozzi wrote: > It's a good question. People have been publishing papers on decision > trees and various

Deep learning libraries for scala

2016-09-30 Thread janardhan shetty
Hi, Are there any good libraries which can be used for scala deep learning models ? How can we integrate tensorflow with scala ML ?

Re: get different results when debugging and running scala program

2016-09-30 Thread Jakob Odersky
There is no image attached, I'm not sure how the apache mailing lists handle them. Can you provide the output as text? best, --Jakob On Fri, Sep 30, 2016 at 8:25 AM, chen yong wrote: > Hello All, > > > > I am using IDEA 15.0.4 to debug a scala program. It is strange to me

Spark on yarn enviroment var

2016-09-30 Thread Saurabh Malviya (samalviy)
Hi, I am running spark on yarn using oozie. When submit through command line using spark-submit spark is able to read env variable. But while submit through oozie its not able toget env variable and don't see driver log. Is there any way we specify env variable in oozie spark action.

Re: Design considerations for batch and speed layers

2016-09-30 Thread Rodrick Brown
We do processing millions of records using Kafka, Elastic Search, Accumulo, Mesos, Spark & Vertica. Their a pattern for this type of pipeline today called SMACK more about here -- http://www.slideshare.net/akirillov/data-processing-platforms-architectures-with-spark-mesos-akka-cassandra-and-kafka

Re: Spark ML Decision Trees Algorithm

2016-09-30 Thread Peter Figliozzi
It's a good question. People have been publishing papers on decision trees and various methods of constructing and pruning them for over 30 years. I think it's rather a question for a historian at this point. On Fri, Sep 30, 2016 at 5:08 PM, janardhan shetty wrote: >

Re: Spark ML Decision Trees Algorithm

2016-09-30 Thread janardhan shetty
Read this explanation but wondering if this algorithm has the base from a research paper for detail understanding. On Fri, Sep 30, 2016 at 1:36 PM, Kevin Mellott wrote: > The documentation details the algorithm being used at >

Re: Issues in compiling spark 2.0.0 code using scala-maven-plugin

2016-09-30 Thread satyajit vegesna
> > > i am trying to compile code using maven ,which was working with spark > 1.6.2, but when i try for spark 2.0.0 then i get below error, > > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute > goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile (default) on >

Re: Design considerations for batch and speed layers

2016-09-30 Thread Ashok Kumar
Can one design a fast pipeline with Kafka, Spark streaming and Hbase  or something similar? On Friday, 30 September 2016, 17:17, Mich Talebzadeh wrote: I have designed this prototype for a risk business. Here I would like to discuss issues with batch

Re: Spark ML Decision Trees Algorithm

2016-09-30 Thread Kevin Mellott
The documentation details the algorithm being used at http://spark.apache.org/docs/latest/mllib-decision-tree.html Thanks, Kevin On Fri, Sep 30, 2016 at 1:14 AM, janardhan shetty wrote: > Hi, > > Any help here is appreciated .. > > On Wed, Sep 28, 2016 at 11:34 AM,

Re: Dataframe Grouping - Sorting - Mapping

2016-09-30 Thread Kevin Mellott
When you perform a .groupBy, you need to perform an aggregate immediately afterwards. For example: val df1 = df.groupBy("colA").agg(sum(df1("colB"))) df1.show() More information and examples can be found in the documentation below.

Having parallelized job inside getPartitions method causes job hanging

2016-09-30 Thread Zhang, Yanyan
Hi there, My team created a class extending RDD, and in the getPartitions method of which we have a parallelized job. We noticed Spark hangs if we do shuffling on our RDD instance. I’m just wondering if it’s a valid use case and if the Spark team could provide us with some suggestion. We

Re: Pls assist: Spark 2.0 build failure on Ubuntu 16.06

2016-09-30 Thread Marco Mistroni
Hi all this problem is still bothering me. Here's my setup - Ubuntu 16.06 - Java 8 - Spark 2.0 - have launched following command: ./build/mvn -X -Pyarn -Phadoop-2.7 -DskipTests clean package and i am gettign this exception: org.apache.maven.lifecycle.LifecycleExecutionException: Failed to

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-09-30 Thread Vadim Semenov
Run more smaller executors: change `spark.executor.memory` to 32g and `spark.executor.cores` to 2-4, for example. Changing driver's memory won't help because it doesn't participate in execution. On Fri, Sep 30, 2016 at 2:58 PM, Babak Alipour wrote: > Thank you for your

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-09-30 Thread Babak Alipour
Thank you for your replies. @Mich, using LIMIT 100 in the query prevents the exception but given the fact that there's enough memory, I don't think this should happen even without LIMIT. @Vadim, here's the full stack trace: Caused by: java.lang.IllegalArgumentException: Cannot allocate a page

Re: Restful WS for Spark

2016-09-30 Thread Mahendra Kutare
Try Cloudera Livy https://github.com/cloudera/livy It may be helpful for your requirement. Cheers, Mahendra about.me/mahendrakutare

Re: Restful WS for Spark

2016-09-30 Thread gobi s
Hi All, sample spark project which uses REST. http://techgobi.blogspot.in/2016/09/bigdata-sample-project.html On Fri, Sep 30, 2016 at 11:39 PM, Vadim Semenov wrote: > There're two REST job servers that work with spark: > >

Re: Restful WS for Spark

2016-09-30 Thread Vadim Semenov
There're two REST job servers that work with spark: https://github.com/spark-jobserver/spark-jobserver https://github.com/cloudera/livy On Fri, Sep 30, 2016 at 2:07 PM, ABHISHEK wrote: > Hello all, > Have you tried accessing Spark application using Restful web-services? >

Restful WS for Spark

2016-09-30 Thread ABHISHEK
Hello all, Have you tried accessing Spark application using Restful web-services? I have requirement where remote user submit the request with some data, it should be sent to Spark and job should run in Hadoop cluster mode. Output should be sent back to user. Please share your expertise.

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-09-30 Thread Vadim Semenov
Can you post the whole exception stack trace? What are your executor memory settings? Right now I assume that it happens in UnsafeExternalRowSorter -> UnsafeExternalSorter:insertRecord Running more executors with lower `spark.executor.memory` should help. On Fri, Sep 30, 2016 at 12:57 PM,

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-09-30 Thread Mich Talebzadeh
What will happen if you LIMIT the result set to 100 rows only -- select from order by field LIMIT 100. Will that work? How about running the whole query WITHOUT order by? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-09-30 Thread Babak Alipour
Greetings everyone, I'm trying to read a single field of a Hive table stored as Parquet in Spark (~140GB for the entire table, this single field should be just a few GB) and look at the sorted output using the following: sql("SELECT " + field + " FROM MY_TABLE ORDER BY " + field + " DESC") ​But

Design considerations for batch and speed layers

2016-09-30 Thread Mich Talebzadeh
I have designed this prototype for a risk business. Here I would like to discuss issues with batch layer. *Apologies about being long winded.* *Business objective* Reduce risk in the credit business while making better credit and trading decisions. Specifically, to identify risk trends within

get different results when debugging and running scala program

2016-09-30 Thread chen yong
Hello All, I am using IDEA 15.0.4 to debug a scala program. It is strange to me that the results were different when I debug or run the program. The differences can be seen in the attached filed run.jpg and debug.jpg. The code lines of the scala program are shown below. Thank you all

Replying same post with proper formatting. - sorry for extra mail

2016-09-30 Thread vatsal
In my Spark Streaming application I am reading data from certain Kafka topic. While reading from topic whenever I encounter certain message (for example: "poison") I want to stop the streaming. Currently I am achieving this using following code: jsc is instance of JavaStreamingContext and

Stopping spark steaming context on encountering certain type of message on Kafka

2016-09-30 Thread vatsal
In my Spark Streaming application I am reading data from certain Kafka topic. While reading from topic whenever I encounter certain message (for example: "poison") I want to stop the streaming. Currently I am achieving this using following code: jsc is instance of JavaStreamingContext and

Grouped windows in spark streaming

2016-09-30 Thread Adrienne Kole
Hi all, I am using Spark Streaming for my use case. I want to - partition or group the stream by key - window the tuples in partitions and - find max/min element in windows (in every partition) My code is like: val keyedStream = socketDataSource.map(s =>

Re: SPARK CREATING EXTERNAL TABLE

2016-09-30 Thread Mich Talebzadeh
This should work Spark 2.0.0, Hive 2.0.1 //create external table in a Hive database with CTAS scala> spark.sql(""" CREATE EXTERNAL TABLE test.extPrices LOCATION "/tmp/extPrices" AS SELECT * FROM test.prices LIMIT 5""") res4: org.apache.spark.sql.DataFrame = [] Now if I go to Hive and look at

Re: YARN - Pyspark

2016-09-30 Thread ayan guha
I understand, thank you for explanation. However, I ran using yarn-client mode, submitted using nohup and I could see the logs getting into log file throughout the life of the job.everything worked well on spark side, just Yarn reported success long before job actually completed. I would love

SPARK CREATING EXTERNAL TABLE

2016-09-30 Thread Trinadh Kaja
Hi All, I am facing different problem using spark, i am using spark-sql. below are the details, sqlcontext.sql("""create external table location '/' as select * from XXX""" ) this is my query table success fully done but in hive command describe formatted showing MANAGETABLE,

Dataframe Grouping - Sorting - Mapping

2016-09-30 Thread AJT
I'm looking to do the following with my Spark dataframe (1) val df1 = df.groupBy() (2) val df2 = df1.sort() (3) val df3 = df2.mapPartitions() I can already groupBy the column (in this case a long timestamp) - but have no idea how then to ensure the returned GroupedData is then sorted by the same

Re: YARN - Pyspark

2016-09-30 Thread Timur Shenkao
It's not weird behavior. Did you run the job in cluster mode? I suspect your driver died / finished / stopped after 12 hours but your job continued. It's possible as you didn't output anything to console on driver node. Quite long time ago, when I just tried Spark Streaming, I launched PySpark

Re: Using Spark as a Maven dependency but with Hadoop 2.6

2016-09-30 Thread Steve Loughran
On 29 Sep 2016, at 10:37, Olivier Girardot > wrote: I know that the code itself would not be the same, but it would be useful to at least have the pom/build.sbt transitive dependencies different when fetching the artifact

RE: udf of aggregation in pyspark dataframe ?

2016-09-30 Thread Mendelson, Assaf
I may be missing something here, but it seems to me you can do it like this: df.groupBy('a').agg(collect_list('c').alias("a",collect_list('d').alias("b")).withColumn('named_list'), my_zip(F.Col("a"), F.Col("b")) without needing to write a new aggregation function -Original Message- From:

Lots of spark-assembly jars localized to /usercache/username/filecache directory

2016-09-30 Thread Lantao Jin
Hi, Our Spark is deployed on YARN and I found there were lots of spark-assembly jars in the Spark heavy user filecache directory (aka /usercache/username/filecache), and you know the assembly jar is bigger than 100 MB before Spark v2. So all of them take 26GB (1/4 reserved space) in most of

Re: compatibility issue with Jersey2

2016-09-30 Thread SimonL
Hi, I'm a new subscriber, has their been any solution to the below issue? Many thanks, Simon -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/compatibility-issue-with-Jersey2-tp24951p27820.html Sent from the Apache Spark User List mailing list archive at

YARN - Pyspark

2016-09-30 Thread ayan guha
Hi I just observed a litlte weird behavior: I ran a pyspark job, very simple one. conf = SparkConf() conf.setAppName("Historical Meter Load") conf.set("spark.yarn.queue","root.Applications") conf.set("spark.executor.instances","50") conf.set("spark.executor.memory","10g")

Re: spark listener do not get fail status

2016-09-30 Thread Aseem Bansal
Hi In case my previous email was lacking in details here are some more details. - using Spark 2.0.0 - launching the job using org.apache.spark.launcher.SparkLauncher.startApplication(myListener) - checking state in the listener's stateChanged method On Thu, Sep 29, 2016 at 5:24 PM, Aseem

Re: Spark ML Decision Trees Algorithm

2016-09-30 Thread janardhan shetty
Hi, Any help here is appreciated .. On Wed, Sep 28, 2016 at 11:34 AM, janardhan shetty wrote: > Is there a reference to the research paper which is implemented in spark > 2.0 ? > > On Wed, Sep 28, 2016 at 9:52 AM, janardhan shetty > wrote: > >>