Re: Temp checkpoint directory for EMR (S3 or HDFS)

2017-05-30 Thread Asher Krim
checkpointDirectory); sparkContext.setCheckpointDir(checkpointPath); Asher Krim Senior Software Engineer On Tue, May 30, 2017 at 12:37 PM, Everett Anderson <ever...@nuna.com.invalid > wrote: > Still haven't found a --conf option. > > Regarding a temporary HDFS checkpoint directory, it looks lik

Re: KMean clustering resulting Skewed Issue

2017-03-29 Thread Asher Krim
any bag-of-words approach to clustering will likely fail unless you first convert the features to a smaller and denser space Asher Krim Senior Software Engineer On Wed, Mar 29, 2017 at 5:49 PM, Reth RM <reth.ik...@gmail.com> wrote: > Hi Krim, > > The dataset that I am experimenti

Re: KMean clustering resulting Skewed Issue

2017-03-26 Thread Asher Krim
, LDA, document2vec, etc). Other than that, this isn't a Spark question. Asher Krim Senior Software Engineer On Fri, Mar 24, 2017 at 9:37 PM, Reth RM <reth.ik...@gmail.com> wrote: > Hi, > > I am using spark k mean for clustering records that consist of news > documents, v

Re: HBase Spark

2017-02-03 Thread Asher Krim
, Benjamin Kim <bbuil...@gmail.com> wrote: > Asher, > > You’re right. I don’t see anything but 2.11 being pulled in. Do you know > where I can change this? > > Cheers, > Ben > > > On Feb 3, 2017, at 10:50 AM, Asher Krim <ak...@hubspot.com> wrote: > > Sorry f

Re: HBase Spark

2017-02-03 Thread Asher Krim
ideas? > > Cheers, > Ben > > > On Feb 3, 2017, at 8:16 AM, Asher Krim <ak...@hubspot.com> wrote: > > Did you check the actual maven dep tree? Something might be pulling in a > different version. Also, if you're seeing this locally, you might want to > check wh

Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

2017-02-03 Thread Asher Krim
differences between MLeap and vanilla Spark? What does Tensorflow support look like? I would love to serve models from a java stack while being agnostic to what framework was used to train them. Thanks, Asher Krim Senior Software Engineer On Fri, Feb 3, 2017 at 11:53 AM, Hollin Wilkins <

Re: HBase Spark

2017-02-03 Thread Asher Krim
Did you check the actual maven dep tree? Something might be pulling in a different version. Also, if you're seeing this locally, you might want to check which version of the scala sdk your IDE is using Asher Krim Senior Software Engineer On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim <bb

Re: HBase Spark

2017-02-02 Thread Asher Krim
Ben, That looks like a scala version mismatch. Have you checked your dep tree? Asher Krim Senior Software Engineer On Thu, Feb 2, 2017 at 1:28 PM, Benjamin Kim <bbuil...@gmail.com> wrote: > Elek, > > Can you give me some sample code? I can’t get mine to work. > > import

Re: mysql and Spark jdbc

2017-01-12 Thread Asher Krim
Have you tried using an alias? You should be able to replace ("dbtable”,"sometable") with ("dbtable”,"SELECT utc_timestamp AS my_timestamp FROM sometable") -- Asher Krim Senior Software Engineer On Thu, Jan 12, 2017 at 10:49 AM, Jorge Machado <jom...@me.com&g

Re: How to save spark-ML model in Java?

2017-01-12 Thread Asher Krim
pipeline model using spark ML (Java) , the following >> exception is thrown. >> >> >> java.lang.UnsupportedOperationException: Pipeline write will fail on >> this Pipeline because it contains a stage which does not implement >> Writable. Non-Writable stage: rfc_98f8c9e0bd04 of type class >> org.apache.spark.ml.classification.Rand >> >> >> Here is my code segment. >> >> >> model.write().overwrite,save >> >> >> model.write().overwrite().save("path >> model.write().overwrite().save("mypath"); >> >> >> How to resolve this? >> >> Thanks and regards! >> >> Minudika >> >> > -- Asher Krim Senior Software Engineer

Re: Spark ML DataFrame API - need cosine similarity, how to convert to RDD Vectors?

2016-11-15 Thread Asher Krim
ark > mllib can use? I've searched, but haven't found anything. > > Thanks! > -- > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io > -- Asher Krim Senior Software Engineer

Re: example LDA code ClassCastException

2016-11-03 Thread Asher Krim
t; at org.apache.spark.rdd.MapPartitionsRDD.compute( > MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run( > Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > ... 1 more > > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/example-LDA-code-ClassCastException-tp28009.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Asher Krim Senior Software Engineer

Re: LIMIT issue of SparkSQL

2016-10-29 Thread Asher Krim
We have also found LIMIT to take an unacceptable amount of time when reading parquet formatted data from s3. LIMIT was not strictly needed for our usecase, so we worked around it -- Asher Krim Senior Software Engineer On Fri, Oct 28, 2016 at 5:36 AM, Liz Bai <liz...@icloud.com> wrote: &

Re: Calculating Min and Max Values using Spark Transformations?

2015-08-28 Thread Asher Krim
Yes, absolutely. Take a look at: https://spark.apache.org/docs/1.4.1/mllib-statistics.html#summary-statistics On Fri, Aug 28, 2015 at 8:39 AM, ashensw as...@wso2.com wrote: Hi all, I have a dataset which consist of large number of features(columns). It is in csv format. So I loaded it into a

Re: Job hang when running random forest

2015-07-29 Thread Asher Krim
Did you get a thread dump? We have experienced similar problems during shuffle operations due to a deadlock in InetAddress. Specifically, look for a runnable thread at something like java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method). Our solution has been to put a timeout around the code

spark task hangs at BinaryClassificationMetrics (InetAddress related)

2015-07-13 Thread Asher Krim
) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745) Thanks, Asher Krim