Hi,
Just found out that ShuffleMapTask has transient locs and
preferredLocs attributes which means that when ShuffleMapTask is
serialized (as a broadcast variable) the information is gone.
Does this mean that the attributes could have not been defined at all
since Spark uses SortShuffleManager
nope, no changes to jenkins in the past few months. ganglia graphs
show higher, but not worrying, memory usage on the workers when the
jobs failed...
i'll take a closer look later tonite/first thing tomorrow morning.
shane
On Tue, Jan 3, 2017 at 4:35 PM, Kay Ousterhout
Actually, I think UDTs can directly translates an object into Spark's
internal format by ScalaReflection and encoder, without the intermediate
generic row. You can directly create a dataset of the objects of UDT.
If you don't convert the dataset to a dataframe, I think RowEncoder won't
step in.
Hi Jacek,
I'm not entirely sure I understand your question, but the reason
preferredLocs can be transient is b/c that is used to define where the
scheduler (on the driver) should prefer to assign the task. But no matter
the value, the task could still get assigned anywhere. By the time that
Hi Shuai,
Disclaimer: I'm not a spark guru, and what's written below are some
notes I took when reading spark source code, so I could be wrong, in
which case I'd appreciate a lot if someone could correct me.
(Yes, I did copy your disclaimer since it applies to me too. Sorry for
duplication :))
Thanks Herman for the explanation.
I silently assume that the other points were ok since you did not object?
Correct?
Pozdrawiam,
Jacek Laskowski
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at
@Jacek The maximum output of 200 fields for whole stage code generation has
been chosen to prevent the code generated method from exceeding the 64kb
code limit. There absolutely no relation between this value and the number
of partitions after a shuffle (if there were they should have used the
Hi Reynold Xin,
I tried setting spark.sql.files.ignoreCorruptFiles = true by using commands,
val sqlContext =new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.setConf("spark.sql.files.ignoreCorruptFiles","true") /
sqlContext.sql("set spark.sql.files.ignoreCorruptFiles=true")
but still
I'm also in favor of this. Thanks for your persistence Cody.
My take on the specific issues Joseph mentioned:
1) voting vs. consensus -- I agree with the argument Ryan Blue made earlier
for consensus:
> Majority vs consensus: My rationale is that I don't think we want to
consider a proposal
You might also be interested in this:
https://issues.apache.org/jira/browse/SPARK-19031
On Tue, Jan 3, 2017 at 3:36 PM, Michael Armbrust
wrote:
> I think we should add something similar to mapWithState in 2.2. It would
> be great if you could add the description of your
I don't have a concern about voting vs consensus.
I have a concern that whatever the decision making process is, it is
explicitly announced on the ticket for the given proposal, with an explicit
deadline, and an explicit outcome.
On Tue, Jan 3, 2017 at 4:08 PM, Imran Rashid
Yes! Using spark 2.1 . I hope i am using right syntax for setting up conf.
sqlContext.setConf("spark.sql.files.ignoreCorruptFiles","true") /
sqlContext.sql("set spark.sql.files.ignoreCorruptFiles=true")
Sent from my Samsung Galaxy smartphone.
Original message From: Ryan Blue
Yes! Using spark 2.1.0 . I hope the command used to set the conf is correct.
sqlContext.setConf("spark.sql.files.ignoreCorruptFiles","true") /
sqlContext.sql("set spark.sql.files.ignoreCorruptFiles=true")
--
View this message in context:
Hi Cody,
Thanks for being persistent about this. I too would like to see this
happen. Reviewing the thread, it sounds like the main things remaining are:
* Decide about a few issues
* Finalize the doc(s)
* Vote on this proposal
Issues & TODOs:
(1) The main issue I see above is voting vs.
The jira: https://issues.apache.org/jira/browse/SPARK-17629
Adding new methods could result in method clutter. Changing behavior of
non-experimental classes is unfortunate (ml Word2Vec was marked
Experimental until Spark 2.0). Neither option is great. If I had to pick, I
would rather change the
Khyati,
Are you using Spark 2.1? The usual entry point for Spark 2.x is spark
rather than sqlContext.
rb
On Tue, Jan 3, 2017 at 11:03 AM, khyati wrote:
> Hi Reynold Xin,
>
> I tried setting spark.sql.files.ignoreCorruptFiles = true by using
> commands,
>
> val
Chetan,
Spark is currently using Hive 1.2.1 to interact with the Metastore. Using
that version for Hive is going to be the most reliable, but the metastore
API doesn't change very often and we've found (from having different
versions as well) that older versions are mostly compatible. Some things
17 matches
Mail list logo