Re: get -101 error code when running select query

2014-04-23 Thread Madhu
I have seen a similar error message when connecting to Hive through JDBC. This is just a guess on my part, but check your query. The error occurs if you have a select that includes a null literal with an alias like this: select a, b, null as c, d from foo In my case, rewriting the query to use

Re: Spark 1.0.0 rc3

2014-05-01 Thread Madhu
I'm guessing EC2 support is not there yet? I was able to build using the binary download on both Windows 7 and RHEL 6 without issues. I tried to create an EC2 cluster, but saw this: ~/spark-ec2 Initializing spark ~ ~/spark-ec2 ERROR: Unknown Spark version Initializing shark ~ ~/spark-ec2

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-13 Thread Madhu
I just built rc5 on Windows 7 and tried to reproduce the problem described in https://issues.apache.org/jira/browse/SPARK-1712 It works on my machine: 14/05/13 21:06:47 INFO DAGScheduler: Stage 1 (sum at console:17) finished in 4.548 s 14/05/13 21:06:47 INFO TaskSchedulerImpl: Removed TaskSet

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-14 Thread Madhu
I built rc5 using sbt/sbt assembly on Linux without any problems. There used to be an sbt.cmd for Windows build, has that been deprecated? If so, I can document the Windows build steps that worked for me. -- View this message in context:

Re: Sorting partitions in Java

2014-05-20 Thread Madhu
with it efficiently and reliably. Is there another solution for sorting arbitrarily large partitions? If not, I don't mind developing and contributing a solution. - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list

Re: Sorting partitions in Java

2014-05-20 Thread Madhu
Andrew mentioned covers the rdd.sortPartitions() use case. Can someone comment on the scope of SPARK-983? Thanks! - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Sorting-partitions-in-Java

Eclipse Scala IDE/Scala test and Wiki

2014-06-02 Thread Madhu
#ContributingtoSpark-IDESetup I can't seem to edit that page. Confluence usually has a an Edit button in the upper right, but it does not appear for me, even though I am logged in. Am I missing something? - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache

Re: Buidling spark in Eclipse Kepler

2014-08-07 Thread Madhu
to build *core* in Eclipse Kepler? In my view, tool independence is a good thing. I'll do what I can to support Eclipse. - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Buidling-spark-in-Eclipse

Re: Unit test best practice for Spark-derived projects

2014-08-07 Thread Madhu
How long does it take to get a spark context? I found that if you don't have a network connection (reverse DNS lookup most likely), it can take up 30 seconds to start up locally. I think a hosts file entry is sufficient. - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View

Re: Handling stale PRs

2014-08-26 Thread Madhu
, overlapping Jira issues, we probably have to create a meta issue and assign resources to fix it. I don't mind helping with that also. - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-stale

Re: Handling stale PRs

2014-08-26 Thread Madhu
might be a reason for their success. - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-stale-PRs-tp8015p8061.html Sent from the Apache Spark Developers List mailing list archive

Re: Jira tickets for starter tasks

2014-08-29 Thread Madhu
. Just my $0.02 - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Jira-tickets-for-starter-tasks-tp8102p8127.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com

Re: [ANNOUNCE] Spark 1.2.0 Release Preview Posted

2014-11-20 Thread Madhu
Thanks Patrick. I've been testing some 1.2 features, looks good so far. I have some example code that I think will be helpful for certain MR-style use cases (secondary sort). Can I still add that to the 1.2 documentation, or is that frozen at this point? - -- Madhu https://www.linkedin.com

Re: [VOTE] Release Apache Spark 1.2.0 (RC2)

2014-12-11 Thread Madhu
-hadoop1.0.4.jar Ran some of my 1.2 code successfully. Review some docs, looks good. spark-shell.cmd works as expected. Env details: sbtconfig.txt: -Xmx1024M -XX:MaxPermSize=256m -XX:ReservedCodeCacheSize=128m sbt --version sbt launcher version 0.13.1 - -- Madhu https://www.linkedin.com

RDD data flow

2014-12-16 Thread Madhu
. The declaration of Partition is throwing me off. Thanks! - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-data-flow-tp9804.html Sent from the Apache Spark Developers List mailing list

Re: RDD data flow

2014-12-17 Thread Madhu
I'll add this to the docs. Thanks Patrick! - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-data-flow-tp9804p9820.html Sent from the Apache Spark Developers List mailing list archive

Re: Detecting configuration problems

2015-09-08 Thread Madhu
and raise an alarm if it's getting too high. Even a warning on the console would be better than a catastrophic OOM. - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Detecting-configuration-problems-tp1

Re: spark-shell 1.5 doesn't seem to work in local mode

2015-09-19 Thread Madhu
, but that should be about it. Does 1.5.0 pick up HADOOP_INSTALL? Wouldn't spark-shell --master local override that? 1.5 seemed to completely ignore --master local - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list

spark-shell 1.5 doesn't seem to work in local mode

2015-09-19 Thread Madhu
ore :10: error: not found: value sqlContext import sqlContext.implicits._ ^ :10: error: not found: value sqlContext import sqlContext.sql ^ - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-sp

Help needed to publish SizeEstimator as separate library

2014-11-19 Thread madhu phatak
Hi, As I was going through spark source code, SizeEstimator https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala caught my eye. It's a very useful tool to do the size estimations on JVM which helps in use cases like memory bounded cache. It

Re: Contributing Documentation Changes

2015-04-24 Thread madhu phatak
wrote: I think that your own tutorials and such should live on your blog. The goal isn't to pull in a bunch of external docs to the site. On Fri, Apr 24, 2015 at 12:57 AM, madhu phatak phatak@gmail.com wrote: Hi, As I was reading contributing to Spark wiki, it was mentioned that we

Contributing Documentation Changes

2015-04-23 Thread madhu phatak
Hi, As I was reading contributing to Spark wiki, it was mentioned that we can contribute external links to spark tutorials. I have written many http://blog.madhukaraphatak.com/categories/spark/ of them in my blog. It will be great if someone can add it to the spark website. Regards, Madhukara

Review of ML PR

2017-08-14 Thread madhu phatak
Hi, I have provided a PR around 2 months back to improve the performance of decision tree by allowing flexible user provided storage class for intermediate data. I have posted few questions about handling backward compatibility but there is no answers from long. Can anybody help me to move this

RandomForest caching

2017-04-28 Thread madhu phatak
Hi, I am testing RandomForestClassification with 50gb of data which is cached in memory. I have 64gb of ram, in which 28gb is used for original dataset caching. When I run random forest, it caches around 300GB of intermediate data which un caches the original dataset. This caching is triggered

Re: RandomForest caching

2017-05-12 Thread madhu phatak
Hi, I opened a jira. https://issues.apache.org/jira/browse/SPARK-20723 Can some one have a look? On Fri, Apr 28, 2017 at 1:34 PM, madhu phatak <phatak@gmail.com> wrote: > Hi, > > I am testing RandomForestClassification with 50gb of data which is cached > in memory.

Time window on Processing Time

2017-08-28 Thread madhu phatak
Hi, As I am playing with structured streaming, I observed that window function always requires a time column in input data.So that means it's event time. Is it possible to old spark streaming style window function based on processing time. I don't see any documentation on the same. -- Regards,

Re: Time window on Processing Time

2017-08-30 Thread madhu phatak
e > > import org.apache.spark.sql.functions._ > > ds.withColumn("processingTime", current_timestamp()) > .groupBy(window("processingTime", "1 minute")) > .count() > > > On Mon, Aug 28, 2017 at 5:46 AM, madhu phatak <phatak@gmail.com> > wrote: