Excellent. that did work - thanks.
On 4 December 2015 at 12:35, Praveen Chundi <mail.chu...@gmail.com> wrote:
> Passing a lambda function should work.
>
> my_rrd.filter(lambda x: myfunc(x,newparam))
>
> Best regards,
> Praveen Chundi
>
>
> On 04.12.2015 13:19,
Hi,
I have RDD1 that is broadcasted.
I have a user defined method for the filter functionality of RDD2, written
as follows:
RDD2.filter(my_func)
I want to access the values of RDD1 inside my_func. Is that possible?
Should I pass RDD1 as a parameter into my_func?
Thanks
Abhishek S
version
> of transform that allows you specify a function with two params - the
> parent RDD and the batch time at which the RDD was generated.
>
> TD
>
> On Thu, Nov 26, 2015 at 1:33 PM, Abhishek Anand <abhis.anan...@gmail.com>
> wrote:
>
>> Hi ,
>>
>
Hi ,
I need to use batch start time in my spark streaming job.
I need the value of batch start time inside one of the functions that is
called within a flatmap function in java.
Please suggest me how this can be done.
I tried to use the StreamingListener class and set the value of a variable
Hi ,
I am using spark streaming to write the aggregated output as parquet files
to the hdfs using SaveMode.Append. I have an external table created like :
CREATE TABLE if not exists rolluptable
USING org.apache.spark.sql.parquet
OPTIONS (
path "hdfs:"
);
I had an impression that in case
Hello ,
Is there any way to query multiple collections from mongodb using spark and
java. And i want to create only one Configuration Object. Please help if
anyone has something regarding this.
Thank You
Abhishek
Anything using Spark RDD’s ???
Abhishek
From: Sandeep Giri [mailto:sand...@knowbigdata.com]
Sent: Friday, September 11, 2015 3:19 PM
To: Mishra, Abhishek; user@spark.apache.org; d...@spark.apache.org
Subject: Re: MongoDB and Spark
use map-reduce.
On Fri, Sep 11, 2015, 14:32 Mishra, Abhishek
You had:
RDD.reduceByKey((x,y) = x+y)
RDD.take(3)
Maybe try:
rdd2 = RDD.reduceByKey((x,y) = x+y)
rdd2.take(3)
-Abhishek-
On Aug 20, 2015, at 3:05 AM, satish chandra j jsatishchan...@gmail.com wrote:
HI All,
I have data in RDD as mentioned below:
RDD : Array[(Int),(Int)] = Array((0,1
Thanks Calvin - much appreciated !
-Abhishek-
On Aug 7, 2015, at 11:11 AM, Calvin Jia jia.cal...@gmail.com wrote:
Hi Abhishek,
Here's a production use case that may interest you:
http://www.meetup.com/Tachyon/events/222485713/
Baidu is using Tachyon to manage more than 100 nodes
Do people use Tachyon in production, or is it experimental grade still?
Regards,
Abhishek
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
execution parallelism).
[Disclaimer: I am no authority on Spark, but wanted to throw my spin based my
own understanding].
Nothing official about it :)
-abhishek-
On Jul 31, 2015, at 1:03 PM, Sujit Pal sujitatgt...@gmail.com wrote:
Hello,
I am trying to run a Spark job that hits an external
Hello,
Please help me with links or some document for Apache Spark interview questions
and answers. Also for the tools related to it ,for which questions could be
asked.
Thanking you all.
Sincerely,
Abhishek
-
To unsubscribe
Hello Vaquar,
I have working knowledge and experience in Spark. I just wanted to test or do a
mock round to evaluate myself. Thank you for the reply,
Please share something if you have for the same.
Sincerely,
Abhishek
From: vaquar khan [mailto:vaquar.k...@gmail.com]
Sent: Wednesday, July 29
Is it fair to say that Storm stream processing is completely in memory, whereas
spark streaming would take a disk hit because of how shuffle works?
Does spark streaming try to avoid disk usage out of the box?
-Abhishek
comparison for end-to-end performance. You could
take a look at this.
https://spark-summit.org/2015/events/towards-benchmarking-modern-distributed-streaming-systems/
On Tue, Jul 21, 2015 at 11:57 AM, Abhishek R. Singh
abhis...@tetrationanalytics.com wrote:
Is it fair to say that Storm stream
could you use a custom partitioner to preserve boundaries such that all related
tuples end up on the same partition?
On Jun 30, 2015, at 12:00 PM, RJ Nowling rnowl...@gmail.com wrote:
Thanks, Reynold. I still need to handle incomplete groups that fall between
partition boundaries. So, I
I mostly use Amazon S3 for reading input data and writing output data for my
spark jobs. I want to know the numbers of bytes read written by my job
from S3.
In hadoop, there are FileSystemCounters for this, is there something similar
in spark ? If there is, can you please guide me on how to use
Hi ,
I'm using CDH5.4.0 quick start VM and tried to build Spark with Hive
compatibility so that I can run Spark sql and access temp table remotely.
I used below command to build Spark, it was build successful but when I
tried to access Hive data from Spark sql, I get error.
Thanks,
Abhi
guidance/help/pointers. Help appreciated.
-Abhishek-
I am no expert myself, but from what I understand DataFrame is grandfathering
SchemaRDD. This was done for API stability as spark sql matured out of alpha as
part of 1.3.0 release.
It is forward looking and brings (dataframe like) syntax that was not available
with the older schema RDD.
On
Hi,
Thank you for your reply. It surely going to help.
Regards,
Abhishek Dubey
From: Cheng, Hao [mailto:hao.ch...@intel.com]
Sent: Monday, March 02, 2015 6:52 PM
To: Abhishek Dubey; user@spark.apache.org
Subject: RE: Performance tuning in Spark SQL.
This is actually a quite open question
In the spark job server* bin *folder, you will find* application.conf*
file, put
context-settings {
spark.cassandra.connection.host = ur address
}
Hope this should work
--
View this message in context:
There is path /tmp/spark-jobserver/file where all the jar are kept by
default. probably deleting from there should work
On 11 Jan 2015 12:51, Sasi [via Apache Spark User List]
ml-node+s1001560n21081...@n3.nabble.com wrote:
How to remove submitted JARs from spark-jobserver?
Hey,
why specific in maven??
we setup a spark job server thru sbt which is easy way to up and running
job server.
On 30 Dec 2014 13:32, Sasi [via Apache Spark User List]
ml-node+s1001560n20896...@n3.nabble.com wrote:
Does my question make sense or required some elaboration?
Sasi
Ohh...
Just curious, we did similar use case like yours getting data out of
Cassandra since job server is a rest architecture all we need is an URL to
access it. Why integrating with your framework matters here when all we
need is a URL.
On 30 Dec 2014 14:05, Sasi [via Apache Spark User List]
Frankly saying I never tried for this volume in practical. But I believe it
should work.
On 30 Dec 2014 15:26, Sasi [via Apache Spark User List]
ml-node+s1001560n20902...@n3.nabble.com wrote:
Thanks Abhishek. We understand your point and will try using REST URL.
However one concern, we had
Hi,
I have iplRDD which is a json, and I do below steps and query through
hivecontext. I get the results but without columns headers. Is there is a
way to get the columns names ?
val teamRDD = hiveContext.jsonRDD(iplRDD)
teamRDD.registerTempTable(teams)
hiveContext.cacheTable(teams)
val result
, I am unable to debug the same. Please guide me.
Thanks,
Abhishek
-Original Message-
From: Matei Zaharia [mailto:matei.zaha...@gmail.com]
Sent: Saturday, August 23, 2014 9:47 AM
To: Mishra, Abhishek
Cc: user@spark.apache.org
Subject: Re: Installation On Windows machine
You should
I got it upright Matei,
Thank you. I was giving wrong directory path. Thank you...!!
Thanks,
Abhishek Mishra
-Original Message-
From: Mishra, Abhishek [mailto:abhishek.mis...@xerox.com]
Sent: Wednesday, August 27, 2014 4:38 PM
To: Matei Zaharia
Cc: user@spark.apache.org
Subject: RE
with my installation and usage. I want to run it on Java.
Looking forward for a reply,
Thanking you in Advance,
Sincerely,
Abhishek
Thanks,
Abhishek Mishra
Software Engineer
Innovation Delivery CoE (IDC)
Xerox Services India
4th Floor Tapasya, Infopark,
Kochi, Kerala, India 682030
m +91-989-516
Hi,
I'm trying to install Spark along with Shark.
Here's configuration details:
Spark 0.9.1
Shark 0.9.1
Scala 2.10.3
Spark assembly was successful but running sbt/sbt publish-local failed.
Please refer attached log for more details and advise.
Thanks,
Abhishek
SparkhomeSPARK_HADOOP_VERSION=2.0.0
, Aaron Davidson ilike...@gmail.com wrote:
I suppose you actually ran publish-local and not publish local like
your example showed. That being the case, could you show the compile error
that occurs? It could be related to the hadoop version.
On Sun, May 25, 2014 at 7:51 PM, ABHISHEK abhi
101 - 132 of 132 matches
Mail list logo