Re: Is it possible to pass additional parameters to a python function when used inside RDD.filter method?

2015-12-04 Thread Abhishek Shivkumar
Excellent. that did work - thanks. On 4 December 2015 at 12:35, Praveen Chundi <mail.chu...@gmail.com> wrote: > Passing a lambda function should work. > > my_rrd.filter(lambda x: myfunc(x,newparam)) > > Best regards, > Praveen Chundi > > > On 04.12.2015 13:19,

How to access a RDD (that has been broadcasted) inside the filter method of another RDD?

2015-12-04 Thread Abhishek Shivkumar
Hi, I have RDD1 that is broadcasted. I have a user defined method for the filter functionality of RDD2, written as follows: RDD2.filter(my_func) I want to access the values of RDD1 inside my_func. Is that possible? Should I pass RDD1 as a parameter into my_func? Thanks Abhishek S

Re: Unable to use "Batch Start Time" on worker nodes.

2015-11-30 Thread Abhishek Anand
version > of transform that allows you specify a function with two params - the > parent RDD and the batch time at which the RDD was generated. > > TD > > On Thu, Nov 26, 2015 at 1:33 PM, Abhishek Anand <abhis.anan...@gmail.com> > wrote: > >> Hi , >> >

Unable to use "Batch Start Time" on worker nodes.

2015-11-26 Thread Abhishek Anand
Hi , I need to use batch start time in my spark streaming job. I need the value of batch start time inside one of the functions that is called within a flatmap function in java. Please suggest me how this can be done. I tried to use the StreamingListener class and set the value of a variable

External Table not getting updated from parquet files written by spark streaming

2015-11-19 Thread Abhishek Anand
Hi , I am using spark streaming to write the aggregated output as parquet files to the hdfs using SaveMode.Append. I have an external table created like : CREATE TABLE if not exists rolluptable USING org.apache.spark.sql.parquet OPTIONS ( path "hdfs:" ); I had an impression that in case

MongoDB and Spark

2015-09-11 Thread Mishra, Abhishek
Hello , Is there any way to query multiple collections from mongodb using spark and java. And i want to create only one Configuration Object. Please help if anyone has something regarding this. Thank You Abhishek

RE: MongoDB and Spark

2015-09-11 Thread Mishra, Abhishek
Anything using Spark RDD’s ??? Abhishek From: Sandeep Giri [mailto:sand...@knowbigdata.com] Sent: Friday, September 11, 2015 3:19 PM To: Mishra, Abhishek; user@spark.apache.org; d...@spark.apache.org Subject: Re: MongoDB and Spark use map-reduce. On Fri, Sep 11, 2015, 14:32 Mishra, Abhishek

Re: Transformation not happening for reduceByKey or GroupByKey

2015-08-21 Thread Abhishek R. Singh
You had: RDD.reduceByKey((x,y) = x+y) RDD.take(3) Maybe try: rdd2 = RDD.reduceByKey((x,y) = x+y) rdd2.take(3) -Abhishek- On Aug 20, 2015, at 3:05 AM, satish chandra j jsatishchan...@gmail.com wrote: HI All, I have data in RDD as mentioned below: RDD : Array[(Int),(Int)] = Array((0,1

Re: tachyon

2015-08-07 Thread Abhishek R. Singh
Thanks Calvin - much appreciated ! -Abhishek- On Aug 7, 2015, at 11:11 AM, Calvin Jia jia.cal...@gmail.com wrote: Hi Abhishek, Here's a production use case that may interest you: http://www.meetup.com/Tachyon/events/222485713/ Baidu is using Tachyon to manage more than 100 nodes

tachyon

2015-08-07 Thread Abhishek R. Singh
Do people use Tachyon in production, or is it experimental grade still? Regards, Abhishek - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: How to increase parallelism of a Spark cluster?

2015-08-02 Thread Abhishek R. Singh
execution parallelism). [Disclaimer: I am no authority on Spark, but wanted to throw my spin based my own understanding]. Nothing official about it :) -abhishek- On Jul 31, 2015, at 1:03 PM, Sujit Pal sujitatgt...@gmail.com wrote: Hello, I am trying to run a Spark job that hits an external

Spark Interview Questions

2015-07-29 Thread Mishra, Abhishek
Hello, Please help me with links or some document for Apache Spark interview questions and answers. Also for the tools related to it ,for which questions could be asked. Thanking you all. Sincerely, Abhishek - To unsubscribe

RE: Spark Interview Questions

2015-07-29 Thread Mishra, Abhishek
Hello Vaquar, I have working knowledge and experience in Spark. I just wanted to test or do a mock round to evaluate myself. Thank you for the reply, Please share something if you have for the same. Sincerely, Abhishek From: vaquar khan [mailto:vaquar.k...@gmail.com] Sent: Wednesday, July 29

spark streaming disk hit

2015-07-21 Thread Abhishek R. Singh
Is it fair to say that Storm stream processing is completely in memory, whereas spark streaming would take a disk hit because of how shuffle works? Does spark streaming try to avoid disk usage out of the box? -Abhishek

Re: spark streaming disk hit

2015-07-21 Thread Abhishek R. Singh
comparison for end-to-end performance. You could take a look at this. https://spark-summit.org/2015/events/towards-benchmarking-modern-distributed-streaming-systems/ On Tue, Jul 21, 2015 at 11:57 AM, Abhishek R. Singh abhis...@tetrationanalytics.com wrote: Is it fair to say that Storm stream

Re: Grouping runs of elements in a RDD

2015-06-30 Thread Abhishek R. Singh
could you use a custom partitioner to preserve boundaries such that all related tuples end up on the same partition? On Jun 30, 2015, at 12:00 PM, RJ Nowling rnowl...@gmail.com wrote: Thanks, Reynold. I still need to handle incomplete groups that fall between partition boundaries. So, I

Read/write metrics for jobs which use S3

2015-06-17 Thread Abhishek Modi
I mostly use Amazon S3 for reading input data and writing output data for my spark jobs. I want to know the numbers of bytes read written by my job from S3. In hadoop, there are FileSystemCounters for this, is there something similar in spark ? If there is, can you please guide me on how to use

Spark1.3.1 build issue with CDH5.4.0 getUnknownFields

2015-05-28 Thread Abhishek Tripathi
Hi , I'm using CDH5.4.0 quick start VM and tried to build Spark with Hive compatibility so that I can run Spark sql and access temp table remotely. I used below command to build Spark, it was build successful but when I tried to access Hive data from Spark sql, I get error. Thanks, Abhi

spark sql error with proto/parquet

2015-04-18 Thread Abhishek R. Singh
guidance/help/pointers. Help appreciated. -Abhishek-

Re: Dataframes Question

2015-04-18 Thread Abhishek R. Singh
I am no expert myself, but from what I understand DataFrame is grandfathering SchemaRDD. This was done for API stability as spark sql matured out of alpha as part of 1.3.0 release. It is forward looking and brings (dataframe like) syntax that was not available with the older schema RDD. On

RE: Performance tuning in Spark SQL.

2015-03-02 Thread Abhishek Dubey
Hi, Thank you for your reply. It surely going to help. Regards, Abhishek Dubey From: Cheng, Hao [mailto:hao.ch...@intel.com] Sent: Monday, March 02, 2015 6:52 PM To: Abhishek Dubey; user@spark.apache.org Subject: RE: Performance tuning in Spark SQL. This is actually a quite open question

Re: How to define SparkContext with Cassandra connection for spark-jobserver?

2015-01-15 Thread abhishek
In the spark job server* bin *folder, you will find* application.conf* file, put context-settings { spark.cassandra.connection.host = ur address } Hope this should work -- View this message in context:

Re: Removing JARs from spark-jobserver

2015-01-10 Thread abhishek
There is path /tmp/spark-jobserver/file where all the jar are kept by default. probably deleting from there should work On 11 Jan 2015 12:51, Sasi [via Apache Spark User List] ml-node+s1001560n21081...@n3.nabble.com wrote: How to remove submitted JARs from spark-jobserver?

Re: Need help for Spark-JobServer setup on Maven (for Java programming)

2014-12-30 Thread abhishek
Hey, why specific in maven?? we setup a spark job server thru sbt which is easy way to up and running job server. On 30 Dec 2014 13:32, Sasi [via Apache Spark User List] ml-node+s1001560n20896...@n3.nabble.com wrote: Does my question make sense or required some elaboration? Sasi

Re: Need help for Spark-JobServer setup on Maven (for Java programming)

2014-12-30 Thread abhishek
Ohh... Just curious, we did similar use case like yours getting data out of Cassandra since job server is a rest architecture all we need is an URL to access it. Why integrating with your framework matters here when all we need is a URL. On 30 Dec 2014 14:05, Sasi [via Apache Spark User List]

Re: Need help for Spark-JobServer setup on Maven (for Java programming)

2014-12-30 Thread abhishek
Frankly saying I never tried for this volume in practical. But I believe it should work. On 30 Dec 2014 15:26, Sasi [via Apache Spark User List] ml-node+s1001560n20902...@n3.nabble.com wrote: Thanks Abhishek. We understand your point and will try using REST URL. However one concern, we had

Is there a way to get column names using hiveContext ?

2014-12-07 Thread abhishek
Hi, I have iplRDD which is a json, and I do below steps and query through hivecontext. I get the results but without columns headers. Is there is a way to get the columns names ? val teamRDD = hiveContext.jsonRDD(iplRDD) teamRDD.registerTempTable(teams) hiveContext.cacheTable(teams) val result

RE: Installation On Windows machine

2014-08-27 Thread Mishra, Abhishek
, I am unable to debug the same. Please guide me. Thanks, Abhishek -Original Message- From: Matei Zaharia [mailto:matei.zaha...@gmail.com] Sent: Saturday, August 23, 2014 9:47 AM To: Mishra, Abhishek Cc: user@spark.apache.org Subject: Re: Installation On Windows machine You should

RE: Installation On Windows machine

2014-08-27 Thread Mishra, Abhishek
I got it upright Matei, Thank you. I was giving wrong directory path. Thank you...!! Thanks, Abhishek Mishra -Original Message- From: Mishra, Abhishek [mailto:abhishek.mis...@xerox.com] Sent: Wednesday, August 27, 2014 4:38 PM To: Matei Zaharia Cc: user@spark.apache.org Subject: RE

Installation On Windows machine

2014-08-22 Thread Mishra, Abhishek
with my installation and usage. I want to run it on Java. Looking forward for a reply, Thanking you in Advance, Sincerely, Abhishek Thanks, Abhishek Mishra Software Engineer Innovation Delivery CoE (IDC) Xerox Services India 4th Floor Tapasya, Infopark, Kochi, Kerala, India 682030 m +91-989-516

Fails: Spark sbt/sbt publish local

2014-05-25 Thread ABHISHEK
Hi, I'm trying to install Spark along with Shark. Here's configuration details: Spark 0.9.1 Shark 0.9.1 Scala 2.10.3 Spark assembly was successful but running sbt/sbt publish-local failed. Please refer attached log for more details and advise. Thanks, Abhishek SparkhomeSPARK_HADOOP_VERSION=2.0.0

Re: Fails: Spark sbt/sbt publish local

2014-05-25 Thread ABHISHEK
, Aaron Davidson ilike...@gmail.com wrote: I suppose you actually ran publish-local and not publish local like your example showed. That being the case, could you show the compile error that occurs? It could be related to the hadoop version. On Sun, May 25, 2014 at 7:51 PM, ABHISHEK abhi

<    1   2