by
plenty I'm sure :) (and would make my implementation more straightforward - the
state management is painful atm).
James
On Wed, 30 Aug 2017 at 14:56 Reynold Xin
<r...@databricks.com<mailto:r...@databricks.com>> wrote:
Sure that's good to do (and as discussed earlier a good comp
by
plenty I'm sure :) (and would make my implementation more straightforward - the
state management is painful atm).
James
On Wed, 30 Aug 2017 at 14:56 Reynold Xin
<r...@databricks.com<mailto:r...@databricks.com>> wrote:
Sure that's good to do (and as discussed earlier a good comp
by
plenty I'm sure :) (and would make my implementation more straightforward - the
state management is painful atm).
James
On Wed, 30 Aug 2017 at 14:56 Reynold Xin
<r...@databricks.com<mailto:r...@databricks.com>> wrote:
Sure that's good to do (and as discussed earlier a good comp
personal
slant is that it's more important to improve support for other datastores than
it is to lower the barrier of entry - this is why I've been pushing here.
James
On Wed, 30 Aug 2017 at 09:37 Ryan Blue
<rb...@netflix.com<mailto:rb...@netflix.com>> wrote:
-1 (non-binding)
Someti
personal
slant is that it's more important to improve support for other datastores than
it is to lower the barrier of entry - this is why I've been pushing here.
James
On Wed, 30 Aug 2017 at 09:37 Ryan Blue
<rb...@netflix.com<mailto:rb...@netflix.com>> wrote:
-1 (non-binding)
Someti
personal
slant is that it's more important to improve support for other datastores than
it is to lower the barrier of entry - this is why I've been pushing here.
James
On Wed, 30 Aug 2017 at 09:37 Ryan Blue
<rb...@netflix.com<mailto:rb...@netflix.com>> wrote:
-1 (non-binding)
Someti
personal
slant is that it's more important to improve support for other datastores than
it is to lower the barrier of entry - this is why I've been pushing here.
James
On Wed, 30 Aug 2017 at 09:37 Ryan Blue
<rb...@netflix.com<mailto:rb...@netflix.com>> wrote:
-1 (non-binding)
Someti
ch out something here
if that'd be useful?
James
On Tue, 29 Aug 2017 at 18:59 Wenchen Fan
<cloud0...@gmail.com<mailto:cloud0...@gmail.com>> wrote:
Hi James,
Thanks for your feedback! I think your concerns are all valid, but we need to
make a tradeoff here.
> Explicitly h
ch out something here
if that'd be useful?
James
On Tue, 29 Aug 2017 at 18:59 Wenchen Fan
<cloud0...@gmail.com<mailto:cloud0...@gmail.com>> wrote:
Hi James,
Thanks for your feedback! I think your concerns are all valid, but we need to
make a tradeoff here.
> Explicitly h
ava class structure works, but otherwise I can just throw).
James
On Tue, 29 Aug 2017 at 02:56 Reynold Xin
<r...@databricks.com<mailto:r...@databricks.com>> wrote:
James,
Thanks for the comment. I think you just pointed out a trade-off between
expressiveness and API simplicity
ava class structure works, but otherwise I can just throw).
James
On Tue, 29 Aug 2017 at 02:56 Reynold Xin
<r...@databricks.com<mailto:r...@databricks.com>> wrote:
James,
Thanks for the comment. I think you just pointed out a trade-off between
expressiveness and API simplicity
supported pushdown stuff, and then the user can
transform and return it.
I think this ends up being a more elegant API for consumers, and also far more
intuitive.
James
On Mon, 28 Aug 2017 at 18:00 蒋星博
<jiangxb1...@gmail.com<mailto:jiangxb1...@gmail.com>> wrote:
+1 (Non-bindi
supported pushdown stuff, and then the user can
transform and return it.
I think this ends up being a more elegant API for consumers, and also far more
intuitive.
James
On Mon, 28 Aug 2017 at 18:00 蒋星博
<jiangxb1...@gmail.com<mailto:jiangxb1...@gmail.com>> wrote:
+1 (Non-bindi
supported pushdown stuff, and then the user can
transform and return it.
I think this ends up being a more elegant API for consumers, and also far more
intuitive.
James
On Mon, 28 Aug 2017 at 18:00 蒋星博
<jiangxb1...@gmail.com<mailto:jiangxb1...@gmail.com>> wrote:
+1 (Non-bindi
-1
This bug SPARK-16515 in Spark 2.0 breaks our cases which can run on 1.6.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-2-0-0-RC4-tp18317p18341.html
Sent from the Apache Spark Developers List mailing list archive at
Hi Spark guys,
I am try to run Spark SQL using bin/spark-sql with Spark 2.0 master
code(commit ba181c0c7a32b0e81bbcdbe5eed94fc97b58c83e) but ran across an
issue that it always connect local derby database and can't connect my
existing hive metastore database. Could you help me to check what's the
This may be related to: https://issues.apache.org/jira/browse/SPARK-13773
Regards,
James
On 11 May 2016 at 15:49, Ted Yu <yuzhih...@gmail.com> wrote:
> In master branch, behavior is the same.
>
> Suggest opening a JIRA if you haven't done so.
>
> On Wed, May 11, 2016
I guess different workload cause diff result ?
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-OutOfMemoryError-Unable-to-acquire-bytes-of-memory-tp16773p16789.html
Sent from the Apache Spark Developers List mailing list archive at
Hi,
I also found 'Unable to acquire memory' issue using Spark 1.6.1 with Dynamic
allocation on YARN. My case happened with setting
spark.sql.shuffle.partitions larger than 200. From error stack, it has a
diff with issue reported by Nezih and not sure if these has same root cause.
Thanks
James
.
Thank you again for the suggestions
On Tue, Feb 23, 2016 at 9:28 PM, Zhan Zhang <zzh...@hortonworks.com> wrote:
> Hi James,
>
> You can try to write with other format, e.g., parquet to see whether it is
> a orc specific issue or more generic issue.
>
> Thanks.
>
> Zhan Z
I'm trying to write an ORC file after running the FPGrowth algorithm on a
dataset of around just 2GB in size. The algorithm performs well and can
display results if I take(n) the freqItemSets() of the result after
converting that to a DF.
I'm using Spark 1.5.2 on HDP 2.3.4 and Python 3.4.2 on
+1
1) Build binary instruction: ./make-distribution.sh --tgz --skip-java-test
-Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver
-DskipTests
2) Run Spark SQL with YARN client mode
This 1.5.1 RC1 package have better test results than previous 1.5.0 except
for
add a critical bug https://issues.apache.org/jira/browse/SPARK-10474
(Aggregation failed with unable to acquire memory)
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-5-0-RC3-tp13928p13987.html
Sent from the Apache Spark
I saw a new "spark.shuffle.manager=tungsten-sort" implemented in
https://issues.apache.org/jira/browse/SPARK-7081, but it can't be found its
corresponding description in
http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc3-docs/configuration.html(Currenlty
there are only 'sort' and
Based on the latest spark code(commit
608353c8e8e50461fafff91a2c885dca8af3aaa8) and used the same Spark SQL query
to test two group of combined configuration and seemed that currently it
don't work fine in tungsten-sort shuffle manager from below results:
*Test 1# (PASSED)*
Thank you for your reply!
Do you mean that currently if i want to use this Tungsten feature, we had to
set sort shuffle manager(spark.shuffle.manager=sort) ,right ? However, I
saw a slide Deep Dive into Project Tungsten: Bringing Spark Closer to Bare
Metal published in Spark Summit 2015 and it
I try to enable Tungsten with Spark SQL and set below 3 parameters, but i
found the Spark SQL always hang below point. So could you please point me
what's the potential cause ? I'd appreciate any input.
spark.shuffle.manager=tungsten-sort
spark.sql.codegen=true
spark.sql.unsafe.enabled=true
Another error:
15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
map output locations for shuffle 3 to bignode1:40443
15/07/31 16:15:28 INFO spark.MapOutputTrackerMaster: Size of output statuses
for shuffle 3 is 583 bytes
15/07/31 16:15:28 INFO
My code
```
// Initial the graph, assign a counter to each vertex that contains the
vertex id only
var anfGraph = graph.mapVertices { case (vid, _) =
val counter = new HyperLogLog(5)
counter.offer(vid)
counter
}
val nullVertex = anfGraph.triplets.filter(edge = edge.srcAttr ==
null).first
) // -
NullPointerException
```
I could found that some vertex attributes in some triplets are null, but
not all.
Alcaid
2015-02-13 14:50 GMT+08:00 Reynold Xin r...@databricks.com:
Then maybe you actually had a null in your vertex attribute?
On Thu, Feb 12, 2015 at 10:47 PM, James alcaid1...@gmail.com wrote
?
On Thu, Feb 12, 2015 at 10:47 PM, James alcaid1...@gmail.com wrote:
I changed the mapReduceTriplets() func to aggregateMessages(), but it
still failed.
2015-02-13 6:52 GMT+08:00 Reynold Xin r...@databricks.com:
Can you use the new aggregateNeighbors method? I suspect the null is
coming from
need the src or dst vertex data. Occasionally it can fail to detect. In
the new aggregateNeighbors API, the caller needs to explicitly specifying
that, making it more robust.
On Thu, Feb 12, 2015 at 6:26 AM, James alcaid1...@gmail.com wrote:
Hello,
When I am running the code on a much bigger
is appreciated.
Alcaid
2015-02-11 19:30 GMT+08:00 James alcaid1...@gmail.com:
Hello,
Recently I am trying to estimate the average distance of a big graph
using spark with the help of [HyperAnf](
http://dl.acm.org/citation.cfm?id=1963493).
It works like Connect Componenet algorithm, while
Hello,
Recently I am trying to estimate the average distance of a big graph using
spark with the help of [HyperAnf](http://dl.acm.org/citation.cfm?id=1963493
).
It works like Connect Componenet algorithm, while the attribute of a vertex
is a HyperLogLog counter that at k-th iteration it
Hi all,
When I was trying to write a test on my spark application I met
```
Error:(14, 43) not found: type LocalSparkContext
class HyperANFSuite extends FunSuite with LocalSparkContext {
```
At the source code of spark-core I could not found LocalSparkContext,
thus I wonder how to write a test
LocalSparkContext, but since the test
classes aren't included in Spark packages, you'll also need to package them
up in order to use them in your application (viz., outside of Spark).
best,
wb
- Original Message -
From: James alcaid1...@gmail.com
To: dev@spark.apache.org
Sent
Recently we want to use spark to calculate the average shortest path
distance between each reachable pair of nodes in a very big graph.
Is there any one ever try this? We hope to discuss about the problem.
For performance, will foreign data format support, same as native ones?
Thanks,
James
On Wed, Oct 8, 2014 at 11:03 PM, Cheng Lian lian.cs@gmail.com wrote:
The foreign data source API PR also matters here
https://www.github.com/apache/spark/pull/2475
Foreign data source like ORC can
these APIs use will be the same
as that for datasources included in the core spark sql library.
Michael
On Thu, Oct 9, 2014 at 2:18 PM, James Yu jym2...@gmail.com wrote:
For performance, will foreign data format support, same as native ones?
Thanks,
James
On Wed, Oct 8, 2014 at 11:03 PM
Didn't see anyone asked the question before, but I was wondering if anyone
knows if Spark/SparkSQL will support ORCFile format soon? ORCFile is
getting more and more popular hi Hive world.
Thanks,
James
40 matches
Mail list logo