Hello Group
I am having issues setting the stripe size, index stride and index on an orc
file using PySpark. I am getting approx 2000 stripes for the 1.2GB file when I
am expecting only 5 stripes for the 256MB setting.
Tried the below options
1. Set the .options on data frame writer. The
rray?
Also is there a better way to send this output to client?
Thanks,
Ashwin
+dev mailing list(since i didn't get a response from user DL)
On Tue, Feb 13, 2018 at 12:20 PM, Ashwin Sai Shankar <ashan...@netflix.com>
wrote:
> Hi Spark users!
> I noticed that spark doesn't allow python apps to run in cluster mode in
> spark standalone cluster. Does anyone kno
Hi Spark users!
I noticed that spark doesn't allow python apps to run in cluster mode in
spark standalone cluster. Does anyone know the reason? I checked jira but
couldn't find anything relevant.
Thanks,
Ashwin
out which columns
need to be recomputed and which can be left as is.
Is there a best practice in the Spark ecosystem for this problem? Perhaps
some metadata system/data lineage system we can use? I'm curious if this is
a common problem that has already been addressed.
Thanks,
Ashwin
t; be updated any more.
> See http://spark.apache.org/docs/latest/structured-
> streaming-programming-guide.html#handling-late-data-and-watermarking
>
> On Mon, Aug 14, 2017 at 4:09 PM, Ashwin Raju <ther...@gmail.com> wrote:
>
>> Hi,
>>
>> I am running Spa
ith outputMode("append") however, the output only has the column
names, no rows. I was originally trying to output to parquet, which only
supports append mode. I was seeing no data in my parquet files, so I
switched to console output to debug, then noticed this issue. Am I
misunderstanding something about how append mode works?
Thanks,
Ashwin
would like to do instead:
def process(time, rdd):
# create dataframe from RDD - input_df
# output_df = dataframe_pipeline_fn(input_df)
-ashwin
ter/core/src/
> main/scala/org/apache/spark/ContextCleaner.scala
>
> On Mon, Mar 27, 2017 at 12:38 PM, Ashwin Sai Shankar <
> ashan...@netflix.com.invalid> wrote:
>
>> Hi!
>>
>> In spark on yarn, when are shuffle files on local disk removed? (Is it
>>
Hi!
In spark on yarn, when are shuffle files on local disk removed? (Is it when
the app completes or
once all the shuffle files are fetched or end of the stage?)
Thanks,
Ashwin
Thanks. I'll try that. Hopefully that should work.
On Mon, Jul 4, 2016 at 9:12 PM, Mathieu Longtin <math...@closetwork.org>
wrote:
> I started with a download of 1.6.0. These days, we use a self compiled
> 1.6.2.
>
> On Mon, Jul 4, 2016 at 11:39 AM Ashwin Raaghav <ashraag.
Longtin <math...@closetwork.org>
wrote:
> 1.6.1.
>
> I have no idea. SPARK_WORKER_CORES should do the same.
>
> On Mon, Jul 4, 2016 at 11:24 AM Ashwin Raaghav <ashraag...@gmail.com>
> wrote:
>
>> Which version of Spark are you using? 1.6.1?
>>
>
Which version of Spark are you using? 1.6.1?
Any ideas as to why it is not working in ours?
On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin <math...@closetwork.org>
wrote:
> 16.
>
> On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav <ashraag...@gmail.com>
> wrote:
>
&g
se more than 1 core per server. However, it seems it will
> start as many pyspark as there are cores, but maybe not use them.
>
> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav <ashraag...@gmail.com>
> wrote:
>
>> Hi Mathieu,
>>
>> Isn't that the same as setting &
node to 1. But the number of
>> pyspark.daemons process is still not coming down. It looks like initially
>> there is one Pyspark.daemons process and this in turn spawns as many
>> pyspark.daemons processes as the number of cores in the machine.
>>
>> Any help is apprecia
List mailing list archive at Nabble.com.
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>
>
>
--
Regards,
Ashwin Raaghav
Hi Vishnu,
A partition will either be in memory or in disk.
-Ashwin
On Feb 28, 2016 15:09, "Vishnu Viswanath" <vishnu.viswanat...@gmail.com>
wrote:
> Hi All,
>
> I have a question regarding Persistence (MEMORY_AND_DISK)
>
> Suppose I am trying to persist an RDD wh
synchronize these multiple streams.
What am I missing?
Thanks,
Ashwin
[1] http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-12.pdf
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail
Hi Bryan,
I see the same issue with 1.5.2, can you pls let me know what was the
resolution?
Thanks,
Ashwin
On Fri, Nov 20, 2015 at 12:07 PM, Bryan Jeffrey <bryan.jeff...@gmail.com>
wrote:
> Nevermind. I had a library dependency that still had the old Spark version.
>
> On Fr
We run large multi-tenant clusters with spark/hadoop workloads, and we use
'yarn's preemption'/'spark's dynamic allocation' to achieve multitenancy.
See following link on how to enable/configure preemption using fair
scheduler :
Hi,
When we run spark-sql, is there a way to get column names/headers with the
result?
--
Thanks,
Ashwin
Never mind, its *set hive.cli.print.header=true*
Thanks !
On Fri, Dec 11, 2015 at 5:16 PM, Ashwin Shankar <ashwinshanka...@gmail.com>
wrote:
> Hi,
> When we run spark-sql, is there a way to get column names/headers with the
> result?
>
> --
> Thanks,
> Ashwin
>
>
>
,
Ashwin
On Fri, Jul 31, 2015 at 4:52 PM, Brandon White bwwintheho...@gmail.com
wrote:
Since one input dstream creates one receiver and one receiver uses one
executor / node.
What happens if you create more Dstreams than nodes in the cluster?
Say I have 30 Dstreams on a 15 node cluster
are creating 500 Dstreams based off 500 textfile
directories, do we need at least 500 executors / nodes to be receivers for
each one of the streams?
On Tue, Jul 28, 2015 at 6:09 PM, Tathagata Das t...@databricks.com
wrote:
@Ashwin: You could append the topic in the data.
val kafkaStreams
,
then an optimal configuration would be,
--num-executors 8 --executor-cores 2 --executor-memory 2G
Thanks,
Ashwin
On Thu, Jul 30, 2015 at 12:08 PM, unk1102 umesh.ka...@gmail.com wrote:
Hi I have one Spark job which runs fine locally with less data but when I
schedule it on YARN to execute I keep
? What is the best way to parallelize this? Any other ideas on
design?
--
Thanks Regards,
Ashwin Giridharan
to hostmachine's ip/port. So the AM can then talk
hostmachine's ip/port, which would be mapped
to the container.
Thoughts ?
--
Thanks,
Ashwin
see that following :*
log4j: Setting property [file] to [].
log4j: setFile called: , true
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: (No such file or directory)
at java.io.FileOutputStream.open(Native Method)
--
Thanks,
Ashwin
Hi,
In spark on yarn and when running spark_shuffle as auxiliary service on
node manager, does map spills of a stage gets cleaned up once the next
stage completes OR
is it preserved till the app completes(ie waits for all the stages to
complete) ?
--
Thanks,
Ashwin
,
Ashwin
making sure but are you looking for the tar in assembly/target dir ?
On Wed, Nov 12, 2014 at 3:14 PM, Ashwin Shankar ashwinshanka...@gmail.com
wrote:
Hi,
I just cloned spark from the github and I'm trying to build to generate a
tar ball.
I'm doing : mvn -Pyarn -Phadoop-2.4 -Dhadoop.version
isolation ?
I know I'm asking a lot of questions. Thanks in advance :) !
--
Thanks,
Ashwin
Netflix
, will the application progress with
the remaining resources/fair share ?
I'm new to spark, sry if I'm asking something very obvious :).
Thanks,
Ashwin
On Wed, Oct 22, 2014 at 12:07 PM, Marcelo Vanzin van...@cloudera.com
wrote:
Hi Ashwin,
Let me try to answer to the best of my knowledge.
On Wed
33 matches
Mail list logo